Architecture · Performance · ~10 minute read · Last updated 2026-05-06

How /dashboard loads in 1.5s and processes 95% fewer bytes — performance architecture

A 3.5-second LCP would have been an immediate disqualifier in head-to-head testing against Triple Whale, Datafast, or Northbeam. A dashboard that bills $99/month per active workspace just to render the KPI strip would be a worse one. On 2026-05-06 we shipped both fixes in parallel: the four-cause taxonomy that took /dashboard from 3.5s LCP to ~1.5s, and the four-layer optimization stack that cuts Tinybird bytes-processed by ~95% per workspace. This page documents both — together they're the complete playbook for shipping real-time-feel analytics on a columnar OLAP database at customer scale.

Why this matters

A dashboard that takes 3.5 seconds to render is one the operator opens once a day, glances at, and closes. A dashboard that loads in 1.5 seconds is one they keep open in a tab and refer to throughout the workday. The difference is product retention, not just engineering pride — and the path from 3.5s to 1.5s is mostly mechanical once you know where to look.

But wall-clock LCP is only half the story. The other half is cost. A real-time-feel analytics dashboard fanning out 144 queries per day against a columnar OLAP database, at 4.58 MiB scanned per query, burns 1.5 GiB of bytes-processed per workspace per day. At 30 active workspaces you've blown past Tinybird's Develop tier; at 100 you're paying a four-figure monthly bill that the customer's $9-$199 subscription cannot economically support. The right answer is not "raise prices" — it's the four-layer optimization stack documented below, which cuts bytes-processed by ~95% without sacrificing freshness.

The trap is optimizing the wrong layer. Bundle size matters until the network is fast enough that it doesn't, and then query waterfalls dominate. Query waterfalls matter until you single-flight them, and then server cold-start dominates. Server cold-start matters until you cache cross-instance, and then per-query bytes-scanned dominates. Most "we sped up the dashboard" blog posts cover one of these layers and miss the others, leaving the LCP needle stuck and the bill rising. The taxonomies below make every layer explicit so a perf project addresses every dominant cost, not just the most visible one.

The 4-cause LCP taxonomy (wall-clock)

Every slow dashboard we have audited fits into one of four cause families. The numbers below are the actual measured costs on /dashboard before and after the 2026-05-06 GL#402 fix wave; your numbers will differ but the structure is portable.

Cause Before After Share of LCP Fix
Bundle size 1.2 MB main chunk 380 KB main chunk ~35% Vite manualChunks + lazy wrappers
Query waterfall 31 client-side fetches 1 single-flight snapshot ~25% /api/v1/dashboard/snapshot endpoint
Server cold-start 870ms TTFB on first hit ~80ms (cache hit) / ~280ms (miss) ~15% Redis stale-while-revalidate cache
Progressive render All-or-nothing render Above-fold first, below-fold lazy ~15% LazyOnView + DashboardSkeleton

The remaining ~10% is image decoding + font swap + initial React reconciliation — small enough to defer until the four bigger causes are addressed.

The bytes-processed problem (cost dimension)

After GL#411 stabilized the timeout surface, a Tinybird usage audit on 2026-05-06 revealed the cost dimension we'd been deferring. One Vitatree-class workspace was burning 1.5 GiB of Tinybird bytes-processed per day, with the live summary_kpis pipe alone responsible for 660 MiB (44% of the total). At 144 queries per day · 4.58 MiB scanned per query, the linear projection was ~45 GiB per month per workspace — 45% of the Tinybird Develop tier 100 GB/mo budget. Ship 30 workspaces and you're over budget; ship 100 and you're on the Pro tier paying $99 per active workspace per month just to render the KPI strip.

The previous GL#402 perf taxonomy fixed wall-clock LCP but left the cost dimension untouched. The snapshot endpoint still hit Tinybird directly on every cache miss, the single-process Map cache evaporated on every Coolify deploy, and the frontend polled aggressively on window-focus + reconnect. Bundle splitting and progressive render do nothing for bytes-scanned — that's a server-side concern only fixable by aggressive cross-instance caching, database-layer pre-aggregation, and frontend polling discipline. Hence the 4-layer stack.

The 4-layer optimization stack (GL#412)

The four layers below compose — they don't substitute for each other. Each layer addresses a distinct dimension of the cost surface, and the multiplicative effect is what turns 1.5 GiB/day into ~75 MiB/day per workspace.

Layer Mechanism Bytes saved File
1. Redis SWR + single-flight 50 concurrent identical filter-shapes collapse to 1 upstream call ~70% (cross-instance cache hits) server/lib/cache/swrCache.ts
2. Tinybird Materialized View summary_kpis_daily_mv pre-aggregates daily rollup; 4.58 MiB → KB per query ~99% per-query bytes tinybird/pipes/summary_kpis_daily_mv__mv.pipe
3. FE polling discipline No refetchOnWindowFocus, no refetchInterval, manual Refresh button ~92% query count (144/day → 12/day) client/src/lib/queryClient.ts
4. BullMQ prewarm worker 5-min cron primes SWR cache for active workspaces 0 (cost-neutral; latency win) server/queues/dashboardPrewarm.ts
5. Perf canary p50/p95 + bytes-processed regression alarm 0 (regression detection) scripts/perf-canary.ts

Combined effect (projected): 1.5 GiB/day → ~75 MiB/day per workspace (~95% reduction in bytes processed); 144 queries/day → ~12 queries/day per workspace (~92% reduction in query count). Per-workspace monthly Tinybird projection drops from 45 GiB to ~2.3 GiB — comfortably inside the Develop tier even at 30+ active workspaces.

Cause 1 (LCP) — Bundle size

Symptom. The browser spends 1.2 seconds parsing and evaluating JavaScript before any pixels render. dist/public/assets/index-*.js is the file to look at; if it crosses 500 KB on a complex SPA you are paying it on every cold load. The fix is not "ship less code" — you can't unship features — it is "ship the right code first, the rest later."

Fix. Vite's build.rollupOptions.output.manualChunks lets you split the bundle into stable cacheable groups: vendor (React, framer-motion, recharts), shadcn/ui primitives, and per-route chunks loaded on demand. Combined with React.lazy wrappers around every route in App.tsx, the initial download drops from "everything the user might ever click" to "just the page they are on right now."

Canary. scripts/check-bundle-size.ts runs in postbuild and asserts the main index-*.js chunk stays under 500 KB (warning) / 800 KB (build-failing). Adjust thresholds when shipping a deliberate feature; the goal is "no surprise regressions," not "stay tiny forever."

Cause 2 (LCP) — Query waterfall

Symptom. Open DevTools Network tab, load /dashboard, count the requests fired before LCP. If the answer is "more than five," you have a waterfall. The pre-fix /dashboard fired 31 requests — one per card per Tinybird pipe — each one round-tripping to the server and waiting on the previous one if it depended on shared filters. Even at 50ms RTT, 31 sequential or near-sequential requests is a 1-second hit before any data renders.

Fix. Single-flight on the server. One endpoint, GET /api/v1/dashboard/snapshot, takes the active filters once and fans out to every Tinybird pipe in parallel server-side, then returns one well-typed JSON payload covering every card on the page. The browser fires one fetch, gets one response, and renders. The pipe fan-out happens at server-to-Tinybird (low-latency, persistent connection pool) instead of browser-to-server (high-latency, fresh TCP).

The trade-off is one slow Tinybird pipe slows the whole snapshot. We mitigate this with per-pipe timeouts (FAST_PIPE_TIMEOUT_MS=5s, SLOW_PIPE_TIMEOUT_MS=30s per GL#411) and source-additive fallback — a missing pipe yields an empty card, not a failed snapshot.

Cause 3 (LCP) — Server cold-start

Symptom. First visit after a deploy or after the cache has aged out: 870ms TTFB. Subsequent visits within the cache window: 80ms TTFB. The ~10x gap is the difference between Tinybird query latency and Redis cache hit latency.

Fix (v1). A stale-while-revalidate cache wrapper. The first request in a cold window fetches from Tinybird, caches the result for 300 seconds (fresh), and stores a stale-tier copy for up to 3600 seconds. Within the fresh window, every browser hit is a cache lookup. Outside the fresh window, the request returns the stale copy immediately and triggers a background refetch.

Fix (v2 — Layer 1 of the GL#412 stack). The v1 cache was a single-process Map; it evaporated on every Coolify deploy and didn't dedup across Coolify instances. The v2 implementation in server/lib/cache/swrCache.ts moves the cache to Redis (Upstash, shared via getSharedRedis() per GL#67) and adds single-flight de-dup: 50 concurrent identical filter-shapes collapse to 1 upstream call via Redis SETNX + pub/sub fan-out. See the next section for the full Layer 1 details.

Cause 4 (LCP) — Progressive render

Symptom. The browser has the JavaScript, the data has arrived, but the page still does not paint because React is busy reconciling the entire dashboard tree at once. Below-fold cards (charts the user has to scroll to see) are paying the same rendering cost as above-fold ones, even though the user does not see them yet.

Fix. LazyOnView: a small wrapper that uses IntersectionObserver to render its children only when they enter the viewport (with a configurable rootMargin so they hydrate just before they scroll into view). Above-fold cards render immediately; below-fold cards defer until the user is about to see them.

Layer 1 (cost) — Redis SWR cache + single-flight

Why. A per-process in-memory cache is fine for one Node instance with steady traffic. Admaxxer runs on Coolify with multi-instance rolling deploys, so the in-memory cache evaporated on every deploy and didn't share state across instances. Worse: under burst load (50 concurrent tabs from the same workspace fetching the same filter-shape), every tab paid a full upstream call because there was no de-dup mechanism at the cache layer. The Redis SWR + single-flight wrapper fixes both.

How. server/lib/cache/swrCache.ts exports withSwrCache<T>(cacheKey, fetcher, { freshSeconds, staleSeconds, singleFlight: true }). Cache key is tb:${pipe}:${stableHash(params)} — stable-hash so semantically-identical filters collapse. Fresh window: 300s (every browser hit is a Redis lookup). Stale window: 3600s (returns stale immediately, triggers background refetch). Single-flight: a Redis SETNX with a 30s TTL claims the refresh; concurrent callers wait on a Redis pub/sub channel for the result instead of each issuing their own upstream call. server/lib/tinybird/cached.ts is the consumer — every existing pipe call upgrades transparently.

Don't. Don't instantiate a second Redis client for the SWR layer — getSharedRedis() is the only entry point per GL#67 (Upstash Redis Rule). A second client doubles the connection-pool contention and breaks pub/sub fan-out for single-flight.

Layer 2 (cost) — Tinybird Materialized Views

Why. Layer 1 caches the heavy query, but cold cache misses still scan the full raw table. summary_kpis scans pixel_events at 4.58 MiB per query — multiply by every cache miss across every workspace and the bytes-processed bill stays bad. The only fix is to scan less data per query, which means pre-aggregating the heavy daily rollup at the database layer.

How. tinybird/datasources/summary_kpis_daily_mv.datasource defines a target datasource keyed by (workspace_id, day, currency). tinybird/pipes/summary_kpis_daily_mv__mv.pipe is the populator pipe — Tinybird's incremental MV engine writes to it on every pixel_events insert. The rewritten tinybird/pipes/summary_kpis.pipe reads from the MV (a few KB per workspace per day) for completed days and falls through to raw pixel_events only for today's partial day. End-to-end: 4.58 MiB scans become KB scans for the 95%+ of the query that's historical.

Critical. After editing any tinybird/** file, run npm run tb:push BEFORE git push origin main (GL#228). Coolify's git-push deploy rebuilds the Node app but does NOT push pipe SQL to the Tinybird query layer. Without tb:push, the MV is inert in production. The postbuild canary at scripts/verify-pipes-pushed.ts prints a warning banner when HEAD touches tinybird/.

Layer 3 (cost) — FE polling discipline

Why. React Query's defaults are aggressive for a reason — for chat apps and live dashboards where freshness is the product, refetching on window focus and reconnect is correct. For a DTC analytics dashboard where the operator is glancing at trends, those defaults turn into accidental cost: every Cmd-Tab back to the dashboard tab fires a fresh fetch. Multiplied across 144 polls per day per workspace, this was the dominant query-count driver in the audit.

How. client/src/lib/queryClient.ts sets refetchOnWindowFocus: false, refetchOnReconnect: false, and refetchInterval: false as the global default. Heavy endpoints (/dashboard/snapshot, /analytics/summary) get staleTime: 5 * 60 * 1000 so React Query treats the cache as authoritative for 5 minutes. The new <RefreshButton /> component (client/src/components/ui/RefreshButton.tsx) gives the user a manual lever — framer-motion spin animation, optimistic "Refreshed 0s ago" microcopy, throttled to 1 click per 10 seconds. The user is in control of freshness; the system isn't burning bytes on autopilot.

Layer 4 (cost) — Pre-warming worker

Why. Layer 1 amortizes repeat hits but not cold-tab opens. If the operator opens /dashboard at 9 AM and the cache aged out at 4 AM, they pay the full cold-cache TTFB. The fix is to keep the cache warm for active workspaces — a background worker that hits the snapshot endpoint server-side on a 5-minute cron, so the user's first paint is always a cache hit.

How. server/queues/dashboardPrewarm.ts registers a BullMQ scheduled job (cron */5 * * * *). On each tick, it queries chat_sessions + pixel-event-recency for workspaces with activity in the last 24 hours, then hits /api/v1/dashboard/snapshot server-side for each. The worker uses getSharedRedis() (GL#67) — never instantiates a second client. The call is idempotent: if the SWR cache is already fresh, the prewarm is a no-op. Cost-neutral for cache layer 1 (the prewarm hit IS the cache fill that would have happened on the next user request anyway), latency win for users.

Layer 5 (cost) — Perf canary

Why. A perf optimization stack is not a one-time ship — it's a budget that has to be defended on every PR. Without an automated canary, Layer 2's MV could silently fall behind a schema change, Layer 1's cache hit rate could collapse from a poorly-chosen cache key, or Layer 3's polling defaults could regress when someone adds a feature with their own React Query config. The canary catches these before they hit production.

How. scripts/perf-canary.ts measures p50/p95 of /api/v1/dashboard/snapshot against a fixture workspace, queries the Tinybird workspace usage API for bytes-processed in the last 24 hours, and writes a perf-canary.json artifact. Postbuild runs it in non-blocking warning mode (catches obvious regressions without breaking the deploy hot path). Nightly Coolify cron runs it in hard-fail mode: p95 > 3s OR bytes > 200 MiB per fixture workspace → exit 1, alert via the existing Slack webhook.

Best-practice references

The patterns above are not novel — they are the consensus playbook of the teams who ship the fastest data dashboards on the public internet. We borrowed shape from each and adapted to Admaxxer's stack:

How to apply this playbook when a dashboard is slow OR expensive

Run the diagnostic in order. Most "perf fixes" fail because they optimize one cause when another dominates — the LCP needle barely moves, the bytes-processed bill barely shrinks, and the team concludes "the dashboard is just slow / expensive."

  1. Measure both dimensions. Lighthouse for LCP / TTFB (wall-clock); Tinybird (or your OLAP DB) workspace usage API for bytes-processed (cost). Record both before-states.
  2. Wall-clock layer 1 (bundle). Inspect dist/public/assets/index-*.js. Run npx vite-bundle-visualizer. If >500 KB, split it before doing anything else.
  3. Wall-clock layer 2 (waterfall). Network tab, count requests before LCP. If >5, single-flight on the server with one snapshot endpoint.
  4. Cost layer 1 (cross-instance cache). Wrap every OLAP call in withSwrCache (or your equivalent). Single-flight is non-negotiable — without it, burst load multiplies upstream cost linearly.
  5. Cost layer 2 (database pre-aggregation). Identify the whale query (the one whose bytes-scanned dominates the audit). Materialize its daily rollup. Read MV first, raw fallback only for today's partial day.
  6. Cost layer 3 (FE polling discipline). Audit your React Query (or SWR / TanStack Query / etc.) defaults. Disable refetch-on-focus / reconnect / interval for heavy endpoints. Add an explicit Refresh button.
  7. Cost layer 4 (prewarm). Stand up a 5-minute cron worker that primes the cache for active workspaces. Cost-neutral, latency win.
  8. Re-measure both dimensions. Lighthouse + workspace usage API. The numbers should improve at every layer; if one layer's fix did not move the needle, you misdiagnosed and another layer dominated.

What not to do

Canaries (defending the budget)

Two canaries defend this stack on every commit:

Lighthouse expectation. Manual Lighthouse runs against /dashboard on a fast 4G connection should show LCP < 2.5 seconds. We considered shipping an automated Lighthouse postbuild canary but rejected it: Chromium is heavy to install in CI, and a full Lighthouse run takes 60-90 seconds — too slow for the build hot path. The bundle-size + perf-canary pair is the cheap continuous protection; Lighthouse is a manual gate before perf-sensitive ships.