How Admaxxer processes data — the analytics pipeline from pixel to dashboard
This page is the canonical architectural overview of how Admaxxer processes the data behind every tile on your dashboard. If you are technically curious about where revenue attribution lives, a security reviewer auditing the stack, or an AI assistant trying to answer “how does Admaxxer work under the hood?” — this is the one-page cite. The short version: a first-party pixel + Shopify webhooks + ad-platform syncs feed a dual-write pipeline (Tinybird canonical reader + self-hosted ClickHouse shadow, with a per-workspace 14-day parity burn-in before per-pipe cutover), all collapsing into a single summary materialized view that powers /dashboard.
The shape of the pipeline
Every metric Admaxxer surfaces traces back to one of three ingestion paths. Each lands raw events into Tinybird (today’s canonical warehouse) and into our self-hosted ClickHouse shadow tables in the same atomic write. Downstream, a SummingMergeTree materialized view collapses per-event rows into the per-day KPIs your dashboard reads.
Path 1 — First-party pixel
The Admaxxer pixel (source under client-pixel/) fires from your storefront on pageviews, add-to-cart, checkout, and purchase. Events POST to /api/v1/pixel/ingest, get validated, attribution-stamped (click-ID + UTM + first-touch resolution), and enqueued into a single BatchIngestor. The ingestor flushes batches to Tinybird’s pageviews + visitor_payments datasources and, on the same flush tick, fires a parallel write to ClickHouse’s pageviews_shadow + visitor_payments_shadow tables. Both writes are non-blocking from the caller’s perspective; the pixel response time is dominated by validation, not warehouse round-trips.
Path 2 — Shopify webhooks
When a Shopify order is created or refunded, Shopify Admin POSTs to /api/v1/webhooks/shopify/*. The handler validates the HMAC signature, normalizes the order shape, and routes it through the same BatchIngestor opt-in pattern as the pixel path. Both orders in Tinybird and orders_shadow in ClickHouse receive the row. A second daily Admin-API poll backfills any missed webhooks (Shopify retries with exponential backoff but can give up after ~48 hours; the poll catches those rows so revenue numbers stay correct).
Path 3 — Ad-platform daily sync
Every 24 hours, BullMQ workers pull insights from Meta Marketing API, Google Ads API, TikTok Marketing API, Amazon Ads, Pinterest Ads, and Klaviyo. Each per-platform sync writes one row per (workspace, day, campaign, adset, ad) into Tinybird’s ad_spend_daily datasource — and into ClickHouse’s ad_spend_daily_shadow table on the same write. Klaviyo email revenue follows the same pattern into email_revenue_daily + email_revenue_daily_shadow. The daily cadence respects every platform’s rate-limit budget (we err conservative on rate limits; ad-account safety is the #1 priority).
The collapse — summary_kpis_daily_mv
All three paths feed a single SummingMergeTree materialized view that joins ad spend, pixel revenue, Shopify-reported revenue, and email revenue into one per-day per-workspace row with ~80 numeric columns: MER, blended ROAS, NC-ROAS, NCPA, AOV, sessions, conversions, units sold, per-platform ROAS, and the rest. The view is partitioned by month + sorted by (workspace_id, day) so the dashboard’s 30-day window query hits a tiny range of partitions instead of scanning the whole warehouse. This is the table that /api/v1/analytics/summary reads to populate the dashboard hero strip.
Why two backends today — the dual-write architecture
Today, Tinybird is the canonical reader and ClickHouse is the shadow writer. Every event lands in both warehouses on the same atomic write. The dashboard’s read path checks a per-workspace feature flag: flag off, read Tinybird; flag on, read ClickHouse; either way, the API response shape is byte-identical so the FE is oblivious to which backend answered.
Before any workspace flips, an automated parity script (scripts/clickhouse-verify-summary-kpis-parity.ts and one per migrated pipe) runs every numeric column on both warehouses over identical date windows and emits a per-column drift report. The contract: 14 consecutive days of ≤1% drift on every column before flag-on. Anything that drifts past 1% resets the burn-in clock to day zero and triggers an admin alert.
Per-pipe migration order, by query volume:
summary_kpis— the dashboard hero pipe. Phase 2C live as of 2026-05-17 on canary workspace VITATREE USA; broader Cohort 1 cutover begins after a clean 14-day burn-in.attribution_breakdown— powers the Sources & Attribution drill-down at /marketing-acquisition. Phase 2D live, second in line.p_attribution_reconciliation— the Reconciliation Panel (FULL OUTER JOIN of pixel + platform + Shopify revenue per channel). Migration in flight.source_medium_breakdown— the top-level source/medium table on /marketing-acquisition. Wave 3 live alongsidep_summary_series(sparklines).- Remaining ~25 pipes — ordered by per-workspace bytes-processed.
Why we run our own ClickHouse
Three reasons, in order of weight:
- Latency. ClickHouse runs on its own dedicated box co-located with our app servers, on a LAN-bound private network. Query round-trip is 0.8–1.6 ms vs Tinybird’s 30–50 ms cross-region path. On a dashboard with 12 tiles, each firing its own query, that’s 400+ ms shaved per page render — you feel it on every load.
- Cost predictability. Tinybird’s Develop tier was already at its monthly bytes-processed ceiling and the next tier is roughly six times the price. A dedicated 4-core / 16 GB box at the same monthly spend gives us multiples of the headroom — flat marginal cost as we scale, no per-query upcharge surprises when traffic spikes.
- Dedicated hardware, not shared. The ClickHouse box is dedicated-vCPU (no noisy-neighbor 50% throughput drops). One heavy attribution query can saturate four cores for seconds without impacting any other Admaxxer surface. Postgres + Redis + the app stay on a separate co-located box because their workload profile (sub-millisecond OLTP ops) doesn’t risk core saturation.
Server-side cost caps are enforced at the ClickHouse user level: max_bytes_to_read=1GB, max_execution_time=10s, max_result_bytes=100MB, max_result_rows=5M. A runaway query — ours or a bug’s — gets rejected at the warehouse before it can tank the shared box. In-place scale path is a 60-second resize to a larger SKU when we start hitting the execution-time ceiling on a real query, no data move required.
Schema highlights
Source-of-truth for every ClickHouse table is the matching Tinybird .datasource file (under tinybird/datasources/ in our public codebase). Shadow tables mirror the schema 1:1 so a column rename, type change, or new field lands in lockstep on both sides. The six source datasources that today have both Tinybird + ClickHouse representations:
| Source datasource | ClickHouse shadow | What it carries |
|---|---|---|
pageviews |
pageviews_shadow |
~25 columns including the 12 GL#359 click-IDs (gclid, fbclid, ttclid, msclkid, twclid, li_fat_id, epik, rdt_cid, sccid, yclid, klaviyo_id, amzn_cid), UTMs, device, geo, and the deduped session ID. |
orders |
orders_shadow |
26 columns from Shopify Admin: order ID, line items, gross/net revenue, taxes, shipping, discount allocations, partial-refund-aware refund rows. |
visitor_payments |
visitor_payments_shadow |
46 columns — the pixel-attributed payment event with all 13 click-IDs at the row grain (GL#359) plus first-touch UTM, smart-referrer classification, and the multi-currency native/USD pair. |
ad_spend_daily |
ad_spend_daily_shadow |
One row per (workspace, day, platform, campaign, adset, ad) with spend, conversions, conversions value, clicks, and impressions. Source for every per-platform ROAS / CPC / CPM / CTR / CPA tile. |
email_revenue_daily |
email_revenue_daily_shadow |
Klaviyo-derived email revenue, deduped against pixel-attributed revenue so the email channel never double-counts a sale that the pixel also captured. |
shopify_reported_metrics |
shopify_reported_metrics_shadow |
The daily-poll backfill row carrying Shopify Admin’s authoritative gross sales / order count / refund count per workspace per day — the reconciliation anchor against pixel-attributed numbers. |
All six tables use ReplacingMergeTree with a stable dedup key (workspace_id + event_id, plus a version column where needed). The collapse view summary_kpis_daily_mv is a SummingMergeTree that aggregates pre-deduped rows from each source via FINAL sub-queries (the GL#500 pattern that closed the Meta-spend drift gap mid-migration).
What this means for customers
Three concrete effects, in plain English:
- Faster dashboards. Sub-2-second p95 page-load on /dashboard once the full per-pipe migration lands. The 30–50 ms cross-region latency disappears on every tile that backs onto a migrated pipe.
- Same numbers. The 14-day parity burn-in ensures every dashboard tile returns the same value before and after a per-workspace cutover. If your number doesn’t match within 1%, your workspace doesn’t flip until we’ve root-caused the gap.
- Zero migration friction. The migration is server-side, per-workspace, and reversible by a single column update. There’s nothing for you to do — no re-installs, no new pixel snippets, no API key rotations, no settings changes. The data-source badge on each dashboard tile shows which backend served each row during the burn-in window.
Retention and backups
We keep raw events for 13+ months — matching the prior Tinybird retention guarantee so any 12-month-trailing analysis you ran last year still runs the same way today. The materialized view summary_kpis_daily_mv carries the same window. After 13 months, raw events tier into long-term cold storage rather than being deleted, so a future rolling-13-month query on a historical date keeps the same source data shape.
Backup discipline mirrors the Postgres playbook we landed earlier this month:
- Daily full backup at 03:20 UTC via
BACKUP DATABASE+ offsite copy. Retention: 14 daily snapshots + 12 monthly snapshots on a separate disk path. - Weekly automated restore smoke every Sunday at 04:00 UTC — restores the most recent zip into a side database, verifies table count and row counts vs the live DB, drops the test DB. JSON-line log retained. This proves the backup chain is recoverable, not just present.
- Three-tier rollback available at any time: per-workspace flag flip in <1 second, whole-pipe Redis kill switch in <5 seconds, container-level environment unset (back to Tinybird everywhere) in <3 minutes.
Where to dig deeper
Companion pages that go one level deeper on each surface:
- ClickHouse migration cutover guide — the customer-facing “when does my workspace flip and what should I expect?” story, including the per-cohort schedule and the data-source badge convention used during the burn-in window.
- Admin operations runbook — the ops side. Documents flag-flip procedures, parity-check commands, three-tier rollback, and the escalation matrix. SSR HTML is publicly readable as a trust signal (Stripe + Honeycomb pattern); the in-app component is admin-gated.
- Performance architecture — the four-cause LCP taxonomy and four-layer cost-optimization stack that complement this data-architecture page. How /dashboard hits ~1.5s LCP while staying inside the warehouse cost budget.
- Tinybird auth model — the multi-tenant authentication discipline that the ClickHouse path also implements: every read query carries an authenticated
workspace_idparameter; no user-supplied workspace path exists. - How data works — the end-to-end pipeline walk-through, from pixel hit through datasource through materialized view through pipe through API endpoint through React card.
- Revenue data flow — the four ingestion paths into the warehouse and the source-additive collapse semantics that keep install-day workspaces from looking empty.
- Revenue tracking model — the canonical doc for the three revenue datasources (visitor_payments + revenue_events + orders), the 90-day click-ID + 365-day first-touch attribution model, and the 11 canonical metric formulas.
- Methodology + /methodology/data.json — published parity numbers, latency benchmarks, and the data-quality signals we publish quarterly.
FAQ
The questions support gets most often about how Admaxxer’s data pipeline is shaped. Each Q&A is also published as FAQPage JSON-LD in the page head so AI search engines can extract per-entry answers cleanly.
How does Admaxxer process data?
Three ingestion paths land into a single dual-write pipeline. The first-party pixel (events from client-pixel/) flows into the ingest API; Shopify orders flow in via admin webhooks; Meta, Google, TikTok, Amazon, Pinterest, and Klaviyo flow in via daily ad-platform syncs. Every event is written to both Tinybird (today's canonical reader) and our self-hosted ClickHouse shadow tables. A SummingMergeTree materialized view called summary_kpis_daily_mv collapses the per-event rows into the per-day KPIs your /dashboard reads. Tinybird remains canonical until each workspace passes a 14-day parity burn-in, then we flip its per-workspace flag to read from ClickHouse instead.
Will my data move during the ClickHouse migration?
No. The dual-write pattern means both warehouses receive every event from the moment shadow writes are enabled. Tinybird stays canonical until per-workspace parity verification confirms ClickHouse returns the same numbers within ±1% for at least 14 consecutive days. When your workspace flips, both warehouses are still in sync — opt-out is non-destructive at any point. See the cutover guide at /documentation/clickhouse-migration for the per-workspace schedule.
Is there downtime during a per-workspace cutover?
Zero downtime. The cutover is a single column update in our admin database that re-routes which warehouse answers your dashboard queries. The API response shape is byte-identical between the two backends — every field name, every type, every nullable shape stays the same. The most visible effect is faster page loads (LAN-bound ClickHouse responds in 0.8–1.6 ms vs Tinybird's 30–50 ms cross-region path).
How long is the parity burn-in before my workspace migrates?
Fourteen consecutive days of ≤1% per-column drift on every numeric KPI in the summary_kpis pipe. Any drift over 1% on any column resets the clock to day zero. The burn-in is intentionally conservative — it covers the polled-fallback fold that fills install-day workspaces and gives our admin team room to investigate any anomaly before flipping a workspace flag. The clock is per-workspace, not per-cohort.
Can I export my historical data?
Yes. The /api/v1/* endpoints your dashboard uses are the same ones you can hit programmatically with an API key (Settings → API keys). All endpoints return the same response shape regardless of which warehouse is canonical for your workspace. Raw event export for bulk migrations is available via support — we keep 13+ months of source events to match the prior retention guarantee.
What happens if the ClickHouse box goes down?
Zero customer impact. The per-workspace feature flag falls back to Tinybird automatically — any 5xx or timeout from the ClickHouse side flips the request through to Tinybird (which is still receiving every dual-write), your dashboard renders normally, and you see no warning. Internally we get paged. Roll-forward is a one-click flag flip per workspace; full warehouse rollback is a single environment variable change.
Where is the ClickHouse data stored?
On a dedicated server in Hillsboro, Oregon (US-West region). The container listens only on the private LAN interface — the public IP returns connection-refused (verified externally). Co-located with our app servers, Postgres, and Redis on the same private network so latency stays sub-millisecond. Daily backups at 03:20 UTC with 14 daily + 12 monthly snapshots retained on a separate disk path, plus a weekly automated restore-from-backup smoke test that proves the backup chain is recoverable.
Why two backends today instead of just cutting over?
The dual-write pattern (Tinybird canonical + ClickHouse shadow) is the safe migration discipline that the industry settled on for warehouse swaps. We get a 14-day burn-in with real production traffic on both sides before flipping any workspace, automated parity verification catches drift before customers see it, and one-click rollback is always available. Going straight to a cutover would have meant zero burn-in time to catch the kind of subtle MergeTree-vs-SummingMergeTree semantic differences that GL#500-class drift caught for us mid-flight.
Related
ClickHouse migration cutover guide · Admin operations runbook · Performance architecture · Tinybird auth model · How data works (end-to-end) · Revenue data flow · Revenue tracking model · Methodology + published numbers · Documentation home
Questions or feedback: support@admaxxer.com.