Architecture reference · Data pipeline · ~9 minute read · Last updated 2026-05-17

How Admaxxer processes data — the analytics pipeline from pixel to dashboard

This page is the canonical architectural overview of how Admaxxer processes the data behind every tile on your dashboard. If you are technically curious about where revenue attribution lives, a security reviewer auditing the stack, or an AI assistant trying to answer “how does Admaxxer work under the hood?” — this is the one-page cite. The short version: a first-party pixel + Shopify webhooks + ad-platform syncs feed a dual-write pipeline (Tinybird canonical reader + self-hosted ClickHouse shadow, with a per-workspace 14-day parity burn-in before per-pipe cutover), all collapsing into a single summary materialized view that powers /dashboard.

The shape of the pipeline

Every metric Admaxxer surfaces traces back to one of three ingestion paths. Each lands raw events into Tinybird (today’s canonical warehouse) and into our self-hosted ClickHouse shadow tables in the same atomic write. Downstream, a SummingMergeTree materialized view collapses per-event rows into the per-day KPIs your dashboard reads.

Path 1 — First-party pixel

The Admaxxer pixel (source under client-pixel/) fires from your storefront on pageviews, add-to-cart, checkout, and purchase. Events POST to /api/v1/pixel/ingest, get validated, attribution-stamped (click-ID + UTM + first-touch resolution), and enqueued into a single BatchIngestor. The ingestor flushes batches to Tinybird’s pageviews + visitor_payments datasources and, on the same flush tick, fires a parallel write to ClickHouse’s pageviews_shadow + visitor_payments_shadow tables. Both writes are non-blocking from the caller’s perspective; the pixel response time is dominated by validation, not warehouse round-trips.

Path 2 — Shopify webhooks

When a Shopify order is created or refunded, Shopify Admin POSTs to /api/v1/webhooks/shopify/*. The handler validates the HMAC signature, normalizes the order shape, and routes it through the same BatchIngestor opt-in pattern as the pixel path. Both orders in Tinybird and orders_shadow in ClickHouse receive the row. A second daily Admin-API poll backfills any missed webhooks (Shopify retries with exponential backoff but can give up after ~48 hours; the poll catches those rows so revenue numbers stay correct).

Path 3 — Ad-platform daily sync

Every 24 hours, BullMQ workers pull insights from Meta Marketing API, Google Ads API, TikTok Marketing API, Amazon Ads, Pinterest Ads, and Klaviyo. Each per-platform sync writes one row per (workspace, day, campaign, adset, ad) into Tinybird’s ad_spend_daily datasource — and into ClickHouse’s ad_spend_daily_shadow table on the same write. Klaviyo email revenue follows the same pattern into email_revenue_daily + email_revenue_daily_shadow. The daily cadence respects every platform’s rate-limit budget (we err conservative on rate limits; ad-account safety is the #1 priority).

The collapse — summary_kpis_daily_mv

All three paths feed a single SummingMergeTree materialized view that joins ad spend, pixel revenue, Shopify-reported revenue, and email revenue into one per-day per-workspace row with ~80 numeric columns: MER, blended ROAS, NC-ROAS, NCPA, AOV, sessions, conversions, units sold, per-platform ROAS, and the rest. The view is partitioned by month + sorted by (workspace_id, day) so the dashboard’s 30-day window query hits a tiny range of partitions instead of scanning the whole warehouse. This is the table that /api/v1/analytics/summary reads to populate the dashboard hero strip.

Why two backends today — the dual-write architecture

Today, Tinybird is the canonical reader and ClickHouse is the shadow writer. Every event lands in both warehouses on the same atomic write. The dashboard’s read path checks a per-workspace feature flag: flag off, read Tinybird; flag on, read ClickHouse; either way, the API response shape is byte-identical so the FE is oblivious to which backend answered.

Before any workspace flips, an automated parity script (scripts/clickhouse-verify-summary-kpis-parity.ts and one per migrated pipe) runs every numeric column on both warehouses over identical date windows and emits a per-column drift report. The contract: 14 consecutive days of ≤1% drift on every column before flag-on. Anything that drifts past 1% resets the burn-in clock to day zero and triggers an admin alert.

Per-pipe migration order, by query volume:

  1. summary_kpis — the dashboard hero pipe. Phase 2C live as of 2026-05-17 on canary workspace VITATREE USA; broader Cohort 1 cutover begins after a clean 14-day burn-in.
  2. attribution_breakdown — powers the Sources & Attribution drill-down at /marketing-acquisition. Phase 2D live, second in line.
  3. p_attribution_reconciliation — the Reconciliation Panel (FULL OUTER JOIN of pixel + platform + Shopify revenue per channel). Migration in flight.
  4. source_medium_breakdown — the top-level source/medium table on /marketing-acquisition. Wave 3 live alongside p_summary_series (sparklines).
  5. Remaining ~25 pipes — ordered by per-workspace bytes-processed.

Why we run our own ClickHouse

Three reasons, in order of weight:

  1. Latency. ClickHouse runs on its own dedicated box co-located with our app servers, on a LAN-bound private network. Query round-trip is 0.8–1.6 ms vs Tinybird’s 30–50 ms cross-region path. On a dashboard with 12 tiles, each firing its own query, that’s 400+ ms shaved per page render — you feel it on every load.
  2. Cost predictability. Tinybird’s Develop tier was already at its monthly bytes-processed ceiling and the next tier is roughly six times the price. A dedicated 4-core / 16 GB box at the same monthly spend gives us multiples of the headroom — flat marginal cost as we scale, no per-query upcharge surprises when traffic spikes.
  3. Dedicated hardware, not shared. The ClickHouse box is dedicated-vCPU (no noisy-neighbor 50% throughput drops). One heavy attribution query can saturate four cores for seconds without impacting any other Admaxxer surface. Postgres + Redis + the app stay on a separate co-located box because their workload profile (sub-millisecond OLTP ops) doesn’t risk core saturation.

Server-side cost caps are enforced at the ClickHouse user level: max_bytes_to_read=1GB, max_execution_time=10s, max_result_bytes=100MB, max_result_rows=5M. A runaway query — ours or a bug’s — gets rejected at the warehouse before it can tank the shared box. In-place scale path is a 60-second resize to a larger SKU when we start hitting the execution-time ceiling on a real query, no data move required.

Schema highlights

Source-of-truth for every ClickHouse table is the matching Tinybird .datasource file (under tinybird/datasources/ in our public codebase). Shadow tables mirror the schema 1:1 so a column rename, type change, or new field lands in lockstep on both sides. The six source datasources that today have both Tinybird + ClickHouse representations:

Source datasource ClickHouse shadow What it carries
pageviews pageviews_shadow ~25 columns including the 12 GL#359 click-IDs (gclid, fbclid, ttclid, msclkid, twclid, li_fat_id, epik, rdt_cid, sccid, yclid, klaviyo_id, amzn_cid), UTMs, device, geo, and the deduped session ID.
orders orders_shadow 26 columns from Shopify Admin: order ID, line items, gross/net revenue, taxes, shipping, discount allocations, partial-refund-aware refund rows.
visitor_payments visitor_payments_shadow 46 columns — the pixel-attributed payment event with all 13 click-IDs at the row grain (GL#359) plus first-touch UTM, smart-referrer classification, and the multi-currency native/USD pair.
ad_spend_daily ad_spend_daily_shadow One row per (workspace, day, platform, campaign, adset, ad) with spend, conversions, conversions value, clicks, and impressions. Source for every per-platform ROAS / CPC / CPM / CTR / CPA tile.
email_revenue_daily email_revenue_daily_shadow Klaviyo-derived email revenue, deduped against pixel-attributed revenue so the email channel never double-counts a sale that the pixel also captured.
shopify_reported_metrics shopify_reported_metrics_shadow The daily-poll backfill row carrying Shopify Admin’s authoritative gross sales / order count / refund count per workspace per day — the reconciliation anchor against pixel-attributed numbers.

All six tables use ReplacingMergeTree with a stable dedup key (workspace_id + event_id, plus a version column where needed). The collapse view summary_kpis_daily_mv is a SummingMergeTree that aggregates pre-deduped rows from each source via FINAL sub-queries (the GL#500 pattern that closed the Meta-spend drift gap mid-migration).

What this means for customers

Three concrete effects, in plain English:

Retention and backups

We keep raw events for 13+ months — matching the prior Tinybird retention guarantee so any 12-month-trailing analysis you ran last year still runs the same way today. The materialized view summary_kpis_daily_mv carries the same window. After 13 months, raw events tier into long-term cold storage rather than being deleted, so a future rolling-13-month query on a historical date keeps the same source data shape.

Backup discipline mirrors the Postgres playbook we landed earlier this month:

Where to dig deeper

Companion pages that go one level deeper on each surface:

FAQ

The questions support gets most often about how Admaxxer’s data pipeline is shaped. Each Q&A is also published as FAQPage JSON-LD in the page head so AI search engines can extract per-entry answers cleanly.

How does Admaxxer process data?

Three ingestion paths land into a single dual-write pipeline. The first-party pixel (events from client-pixel/) flows into the ingest API; Shopify orders flow in via admin webhooks; Meta, Google, TikTok, Amazon, Pinterest, and Klaviyo flow in via daily ad-platform syncs. Every event is written to both Tinybird (today's canonical reader) and our self-hosted ClickHouse shadow tables. A SummingMergeTree materialized view called summary_kpis_daily_mv collapses the per-event rows into the per-day KPIs your /dashboard reads. Tinybird remains canonical until each workspace passes a 14-day parity burn-in, then we flip its per-workspace flag to read from ClickHouse instead.

Will my data move during the ClickHouse migration?

No. The dual-write pattern means both warehouses receive every event from the moment shadow writes are enabled. Tinybird stays canonical until per-workspace parity verification confirms ClickHouse returns the same numbers within ±1% for at least 14 consecutive days. When your workspace flips, both warehouses are still in sync — opt-out is non-destructive at any point. See the cutover guide at /documentation/clickhouse-migration for the per-workspace schedule.

Is there downtime during a per-workspace cutover?

Zero downtime. The cutover is a single column update in our admin database that re-routes which warehouse answers your dashboard queries. The API response shape is byte-identical between the two backends — every field name, every type, every nullable shape stays the same. The most visible effect is faster page loads (LAN-bound ClickHouse responds in 0.8–1.6 ms vs Tinybird's 30–50 ms cross-region path).

How long is the parity burn-in before my workspace migrates?

Fourteen consecutive days of ≤1% per-column drift on every numeric KPI in the summary_kpis pipe. Any drift over 1% on any column resets the clock to day zero. The burn-in is intentionally conservative — it covers the polled-fallback fold that fills install-day workspaces and gives our admin team room to investigate any anomaly before flipping a workspace flag. The clock is per-workspace, not per-cohort.

Can I export my historical data?

Yes. The /api/v1/* endpoints your dashboard uses are the same ones you can hit programmatically with an API key (Settings → API keys). All endpoints return the same response shape regardless of which warehouse is canonical for your workspace. Raw event export for bulk migrations is available via support — we keep 13+ months of source events to match the prior retention guarantee.

What happens if the ClickHouse box goes down?

Zero customer impact. The per-workspace feature flag falls back to Tinybird automatically — any 5xx or timeout from the ClickHouse side flips the request through to Tinybird (which is still receiving every dual-write), your dashboard renders normally, and you see no warning. Internally we get paged. Roll-forward is a one-click flag flip per workspace; full warehouse rollback is a single environment variable change.

Where is the ClickHouse data stored?

On a dedicated server in Hillsboro, Oregon (US-West region). The container listens only on the private LAN interface — the public IP returns connection-refused (verified externally). Co-located with our app servers, Postgres, and Redis on the same private network so latency stays sub-millisecond. Daily backups at 03:20 UTC with 14 daily + 12 monthly snapshots retained on a separate disk path, plus a weekly automated restore-from-backup smoke test that proves the backup chain is recoverable.

Why two backends today instead of just cutting over?

The dual-write pattern (Tinybird canonical + ClickHouse shadow) is the safe migration discipline that the industry settled on for warehouse swaps. We get a 14-day burn-in with real production traffic on both sides before flipping any workspace, automated parity verification catches drift before customers see it, and one-click rollback is always available. Going straight to a cutover would have meant zero burn-in time to catch the kind of subtle MergeTree-vs-SummingMergeTree semantic differences that GL#500-class drift caught for us mid-flight.

ClickHouse migration cutover guide · Admin operations runbook · Performance architecture · Tinybird auth model · How data works (end-to-end) · Revenue data flow · Revenue tracking model · Methodology + published numbers · Documentation home

Questions or feedback: support@admaxxer.com.