What changed in Phase 2D (Sources & Attribution)?

Three things shipped today (2026-05-17). (1) A CH-native query module for the attribution_breakdown pipe that covers all four breakdown levels (channel, campaign, adset, ad) with faithful translation of every UTM-matching rule and attribution model from the Tinybird side. (2) A per-workspace read flag at /admin/clickhouse-flags so admins can flip individual workspaces onto ClickHouse for attribution while every other workspace stays on Tinybird. (3) A parity script at scripts/clickhouse-verify-attribution-breakdown-parity.ts that iterates all four levels with per-row per-column drift detection at the same ±1% gate as the Phase 2B summary_kpis script. The Adset → Ad chevron-resolution fix landed in the same wave so the channel attribution drilldown's deepest level renders correctly.

How is Admaxxer's Sources & Attribution different from Triple Whale and Datafast?

Three concrete differentiators. (1) Pricing model: Admaxxer's self-hosted ClickHouse runs on a $40/mo dedicated Hetzner box at flat marginal cost; both Triple Whale (managed warehouse) and Datafast (per-event analytics billing per their docs) charge proportionally to traffic. (2) Attribution surface: Admaxxer exposes seven models on a single dropdown — last_click, first_click, linear_all, linear_paid, time_decay (7d half-life), position_based (40/20/40 U-shape), and markov chains — across four data lenses (pixel, Shopify customer journeys, platform-reported, and an all-lenses comparison that surfaces the wedge between them). Triple Whale documents on-the-fly switching across a curated set; Datafast covers single-touch with optional view-through. (3) Transparency: every Admaxxer pipe, materialized view, and CREATE TABLE statement is visible to merchants inside docs and on GitHub — the exact SQL behind a tile is one click away. Triple Whale's Sonar service and Datafast's pipe layer are closed-source. See triplewhale.readme.io and datafa.st/docs for their published surfaces.

Is my Admaxxer dashboard data live or cached?

Cached up to 5 minutes on the server-side SWR (stale-while-revalidate) Redis layer per request shape. The ClickHouse read path adds ~0.8-1.6 ms of server-side latency over the cache layer, so the cache window dominates. Most dashboard interactions feel instant because the cache hit-rate exceeds 95% during a typical operator session; cache invalidation triggers automatically on pixel ingest + webhook events so a real revenue event surfaces inside one cache cycle.

How do I tell if my dashboard is reading from Tinybird or ClickHouse?

Two ways. (1) Every tile carries a small data-source badge in its top-right corner during the burn-in window — sky-blue chip means ClickHouse, neutral chip means Tinybird; the badge disappears once the pipe finishes cutover. (2) Programmatically: the API response envelope on /api/v1/* dashboard endpoints carries a data_source field on each row equal to clickhouse or tinybird. This is the same field the FE badge reads from.

What happens if the ClickHouse box goes down?

Zero customer impact. The per-workspace feature flag falls back to the Tinybird path automatically: any 5xx or timeout from the CH side flips the request through to Tinybird (which is still receiving every dual-write), the dashboard renders normally, and the operator sees no warning. Internally we get paged. Roll-forward is a one-click flag flip per workspace; full warehouse roll-back is a single config change in Coolify env.

Can I opt out of the migration?

It is admin-controlled — there is no merchant-visible toggle. The parity gate is the contract that ensures numbers stay correct across the cutover, and exposing it as a merchant control would defeat the gate. If you have a specific reason to stay on Tinybird (regulatory, contractual, etc.), email support and we will pin your workspace flag OFF; the dual-write keeps both warehouses in sync regardless of which one is canonical for you, so opt-out is non-destructive.

Does the ClickHouse migration change my dashboard numbers?

No. Tinybird remains canonical until per-workspace parity verification confirms ClickHouse returns the same numbers (within 1% per column) for at least 14 consecutive days. The cutover is admin-controlled and reversible by a single column update. Current observed drift on the canary workspace VITATREE USA after the GL#500 fix: ±0.8% — well inside the gate.

When will my workspace migrate?

Cohort 1 is USD DTC workspaces with >=90 days of data history (canary VITATREE USA is the first member; flipped 2026-05-17). Cohort 2 is install-day / early-stage USD DTC. Cohort 3 is non-USD workspaces (blocked on the FX-conversion pipe). Cohort 4 is SaaS and Lead-Gen. Each cohort needs a 14-day parity burn-in before flag-on; if VITATREE USA stays parity-clean through 2026-05-31, broader Cohort 1 cutover begins.

How fast are ClickHouse queries compared to Tinybird?

Phase 2C benchmark (2026-05-17): median summary_kpis query latency is ~1.2 ms on ClickHouse (in-host docker network) vs ~32 ms cross-region on Tinybird. p95 is ~2.1 ms vs ~58 ms. Tile-level dashboard renders are ~880 ms faster on the CH side. The amplification factor is the dashboard fan-out — 30+ pipes per page render — so cross-region savings compound.

How do I verify my Admaxxer numbers are correct?

Compare a 30-day window in Admaxxer against Shopify Admin Analytics (gross sales should match within ~5%, accounting for pixel attribution loss; polled order counts should match within ~1%). For spend, compare to Meta Ads Manager and Google Ads native reporting (should match to the cent). Conversion counts will differ slightly due to attribution-window differences between platforms.

Where is the ClickHouse data stored?

On a dedicated Hetzner CCX23 box in Hillsboro, OR (US-West region). The container listens only on the private LAN interface; the public IPv4 returns connection-refused. Daily backups at 03:20 UTC with 14 daily + 12 monthly snapshots retained.

Which Admaxxer surfaces are powered by source_medium_breakdown and p_summary_series?

source_medium_breakdown powers the Sources & Attribution top-level table at /marketing-acquisition — every (source, medium) row you scan to see which channel paid off yesterday, with paid rows joined to ad_spend_daily for CPC/CPM/ROAS and organic rows surfacing session+pageview volume. p_summary_series powers the inline sparkline arrays on every summary KPI tile at /dashboard — revenue, spend, blended ROAS, MER, AOV, sessions, conversions, units, NCPA, and the rest of ~16 numeric series, all at per-day granularity. Wave 3 ships CH-native query modules, per-workspace read flags, and parity verification scripts for both pipes today.

Why does p_summary_series have a stricter parity requirement than other pipes?

p_summary_series drives a visually sensitive surface — the sparkline shape on every summary tile is the trend signal operators use to decide whether to dig into a number. A 14-day series of (10, 11, 10, 12, 11, ...) on ClickHouse and (11, 10, 11, 11, 12, ...) on Tinybird can pass a naive per-day ±1% gate but render visibly different sparklines. So in addition to the standard ±1% per-day per-column drift budget, the burn-in for p_summary_series adds a visual-parity step: the CH series and the Tinybird series must produce the same up/down/flat direction-of-change between adjacent days, plus the same min/max bucket across the 14-day window. Numeric drift inside ±1% is not sufficient on its own for this pipe.

Architecture reference · analytics warehouse · ~9 minute read · last updated 2026-05-17 · Phase 2C live

How Admaxxer migrated to self-hosted ClickHouse — Phase 2D ships attribution_breakdown

Admaxxer is moving its analytics warehouse off Tinybird onto a self-hosted ClickHouse cluster running on a dedicated Hetzner box in Hillsboro, OR. This page is for merchants who notice their dashboard numbers, security reviewers, and anyone curious about how a phased warehouse migration with zero customer-visible downtime works in practice. The short version: Tinybird stays canonical until your workspace’s numbers match on both sides for at least 14 days; only then does our admin team flip your workspace onto the ClickHouse path. You see no change either way — this is the polished version of “we’re upgrading the engine without shaking the cup.”

Phase 2D update (2026-05-17) — `attribution_breakdown` migrated

Sources & Attribution now reads from ClickHouse for flagged workspaces. The attribution_breakdown pipe — the one powering the Channel Attribution Drilldown on /marketing-acquisition — is the second pipe to land a CH-native translation after summary_kpis. Five things landed today:

attribution_breakdown CH module shipped. A new server/lib/clickhouse/attributionBreakdownQuery.ts module emits parameterized SQL covering all four breakdown levels: channel, campaign, adset, ad. Same UTM-matching rules, same provenance gates (meta_v21 / google_v18 / tiktok_v13 / etc.), same currency-stamping behavior as the Tinybird pipe. Seven attribution models supported: last_click, first_click, linear_all, linear_paid, time_decay (7d half-life), position_based (40/20/40 U-shape), and markov.
Per-workspace read flag wired. /admin/clickhouse-flags now exposes a second toggle alongside summary_kpis: a per-workspace attribution_breakdown (CH) switch. Default OFF — Tinybird stays canonical until each workspace clears its 14-day parity gate.
Adset → Ad drilldown depth fix landed in the same wave. Drilling from Adset to Ad inside the Channel Attribution table now lazy-loads correctly. The fix corrects the breakdown_level=ad request envelope so the FE chevron resolves children instead of falling into the “preparing” empty state.
Parity script covers all four breakdown levels. scripts/clickhouse-verify-attribution-breakdown-parity.ts iterates channel / campaign / adset / ad with per-row per-column drift detection across ~25 numeric fields. Same ±1% gate, same --strict exit-2 contract as the Phase 2B summary_kpis script. Iterates each level so divergences at the ad level (which only surface when a user actually drills) are caught before cutover.
Burn-in clock starts after summary_kpis clears. Phase 2D dual-write goes live for VITATREE USA next, then a 14-day clock at ±1% per breakdown level before broader Cohort 1 cutover. Rollback is one toggle flip — same per-workspace recovery posture as Phase 2C.

How to verify your numbers: open /marketing-acquisition and look at the data-source badge on the Channel Attribution table after your workspace flips. A sky-blue ClickHouse chip means the row was sourced from CH; absence means Tinybird. Per-row per-column numbers should agree to within ±1% — if they don’t, file a support ticket and we’ll flip your workspace back to Tinybird in under a second. The operator-facing flag flip lives at /documentation/admin/clickhouse-runbook.

Why Admaxxer’s Sources & Attribution beats Triple Whale + Datafast

Self-hosted ClickHouse is table stakes for serious DTC analytics — both Triple Whale and Datafast run ClickHouse under the hood and we ship on the same architecture for the same reasons. The differentiators are upstream of the warehouse: pricing, attribution-model surface, and transparency. Phase 2D’s attribution_breakdown cutover makes all three concrete on the Sources & Attribution screen.

Axis	Admaxxer (this ship)	Triple Whale	Datafast
Warehouse hosting + pricing	Self-hosted ClickHouse on a dedicated $40/mo box. Flat marginal cost — no per-event upcharge as you scale.	Managed warehouse with event-volume billing — costs scale with traffic, not just usage.	Co-located ClickHouse, similar architecture; pricing tied to billable analytics events per their docs.
Attribution models exposed	Seven models on a single dropdown: `last_click`, `first_click`, `linear_all`, `linear_paid`, `time_decay` (7d half-life), `position_based` (40/20/40), and markov chains. Four data lenses: pixel, Shopify journeys, platform-reported, and a side-by-side all-lenses view.	Sonar supports “switch between attribution models on the fly” for a curated set per triplewhale.readme.io.	Single-touch attribution with optional view-through per datafa.st/docs; multi-model selector positioning is narrower.
Pipe + datasource transparency	Every pipe, every datasource, every materialized-view `CREATE TABLE` is visible inside merchant-readable docs and the GitHub repo. The exact SQL behind a tile is one click away.	Internals are proprietary; merchant-facing docs cover surfaces, not the warehouse layer.	Documentation focuses on dashboard surface and pixel install; underlying warehouse pipes not exposed.
Migration discipline	Per-workspace dual-write parity gate, customer-facing data-source badge during burn-in, three-tier rollback (flag flip <1s / pipe kill-switch <5s / container reset <3min), parity script in the repo for every migrated pipe.	Migrations happen server-side without merchant-visible signaling.	Migrations happen server-side without merchant-visible signaling.

Citations (fair-use, ≤15 words per quote): triplewhale.readme.io describes Sonar as a unified attribution layer with on-the-fly model switching. datafa.st/docs covers analytics-event billing and single-touch attribution with optional view-through. Differentiator rows are fact-of-product, not editorial.

Phase 2C update (2026-05-17) — VITATREE USA is live

Four things landed today:

GL#500 fix shipped. The meta-inflation parity drift that gated Phase 2B cutover is gone. The CH summary_kpis path now matches Tinybird within ±0.8% across every workspace-day we tested.
Performance pass complete. Median CH summary_kpis query latency is now ~1.2 ms (in-host on the admaxxer-ch container), vs ~32 ms for the cross-region Tinybird path. p95 is ~2.1 ms vs ~58 ms. Tile-level dashboard renders ~880 ms faster on the CH side.
Dual-write live for canary workspace VITATREE USA. The first production traffic reading summary_kpis from ClickHouse. The dashboard tile-level data-source badge confirms each row’s backing warehouse in real time. 90 days of pageviews + revenue_events + orders backfilled into ClickHouse so long-window LTV queries (7/30/90d) cross-validate against Tinybird with the same ±1% gate.
Parity burn-in clock started. 14-day continuous-parity clock is running. As long as no workspace-day drifts past ±1%, broader cohort cutover begins 2026-05-31. If any day fails, the clock resets and we investigate before resuming. Conservatism beats speed here.

The operations runbook for this cutover lives at /documentation/admin/clickhouse-runbook (admin-only access). It documents how to check parity status for a workspace, flip a workspace flag on, roll back a workspace, trigger a backfill for a specific date window, read the CH performance dashboard and system.query_log, and the escalation matrix.

Wave 3 update (2026-05-17) — `source_medium_breakdown` + `p_summary_series`

Two more pipes shipped today, in lockstep:

source_medium_breakdown CH module shipped. The pipe that powers the Sources & Attribution top-level table at /marketing-acquisition — every (source, medium) row your team scans first thing in the morning to see which channel paid off yesterday — now has a CH-native query path. Paid rows join ad_spend_daily for CPC / CPM / ROAS; organic rows surface session + pageview volume; the same UTM canonicalization rules apply on both warehouses so reconciliation between Tinybird and CH is one-to-one per row.
p_summary_series CH module shipped. The pipe that powers the sparkline arrays on every summary KPI tile at /dashboard — revenue, spend, blended ROAS, MER, AOV, sessions, conversions, units, NCPA, and the rest of the ~16 numeric series. Per-day granularity preserved exactly so the chart animation reads identically. Float64 coercion handled at the SQL boundary per GL#418 (the Tinybird Decimal vs Float64 cast key lesson).
Two new per-workspace read flags wired. /admin/clickhouse-flags now exposes source_medium_breakdown (CH) and p_summary_series (CH) toggles alongside the existing summary_kpis and attribution_breakdown ones. Default OFF on every workspace — Tinybird stays canonical until each workspace clears its 14-day parity gate per pipe.
Two parity scripts cover both pipes end-to-end. scripts/clickhouse-verify-source-medium-breakdown-parity.ts iterates per-(source, medium) row with per-column drift detection; scripts/clickhouse-verify-summary-series-parity.ts iterates per-day series column with per-column drift detection across ~16 numeric fields. Same ±1% drift budget, same --strict exit-2 contract, same per-workspace + per-day decomposition as the Phase 2B summary_kpis and Phase 2D attribution_breakdown scripts.
Sparkline-render visual contract preserved. p_summary_series powers a visually sensitive surface — the sparkline shape on each tile reads as the trend signal that operators use to decide whether to dig into a number. A 14-day series of (10, 11, 10, 12, 11, …) on CH vs (11, 10, 11, 11, 12, …) on Tinybird can pass a naive per-day ±1% gate but render visibly different sparklines. We added an extra visual-parity verification inside the burn-in: the CH series and the Tinybird series must produce the same up/down/flat direction-of-change between adjacent days, plus the same min/max bucket across the 14-day window. Numeric drift inside ±1% is not sufficient on its own for this pipe.
Burn-in clock starts after upstream summary_kpis clears. Wave 3 dual-write goes live for VITATREE USA next, then a 14-day clock at ±1% per (source, medium) row for source_medium_breakdown and per-day per-column for p_summary_series before broader Cohort 1 cutover. Rollback is one toggle flip per pipe — same per-workspace recovery posture as Phase 2C and 2D, and the three-layer rollback (per-workspace flag flip <1s / pipe-wide Redis kill-switch <5s / container-level CLICKHOUSE_URL unset <3min) applies independently to each toggle.

What each pipe powers, in customer-facing terms: source_medium_breakdown is the table you scan to answer “which channel paid off yesterday?”; p_summary_series is the inline chart you glance at to answer “is this metric trending up, flat, or down?”. Both are read-path-only changes — the API shapes returned by /api/v1/analytics/source-medium and /api/v1/analytics/summary-series are byte-identical before and after cutover; every field name, every type, every nullable shape stays the same. The migration is invisible at the consumer level until your workspace is explicitly opted in by an admin.

Admin runbook for these two new flags (admin-only access): /documentation/admin/clickhouse-runbook. The runbook documents how to read parity status for the new pipes, flip a workspace flag per pipe, roll back per pipe, and the sparkline-shape verification step added for p_summary_series.

Why we’re migrating

Three reasons, in order of weight:

Per-query billing was capping us at our growth point. Tinybird’s Develop tier ($40/mo) was sufficient through the early product but our pipe-count × workspace-count is now bumping the vCPU ceiling daily. The next Tinybird tier is $249/mo+; a dedicated Hetzner CCX23 ($40/mo, 4 dedicated AMD cores, 16 GB RAM) gives us 8× the headroom for the same spend, no per-query upcharge, and no scaling surprises as we ship features.
Cross-region latency was a real cost. Tinybird runs in AWS us-east-1. Admaxxer’s app servers run in Hetzner Hillsboro, OR. Every dashboard query paid ~30–50 ms of network round-trip before the query even started. Co-locating ClickHouse with the app server cuts that to 0.8–1.6 ms over a private LAN — on a dashboard with 12 tiles each firing its own query, that’s 400+ ms shaved per page load. You feel this on the dashboard render.
We already self-host everything else. Admaxxer’s Postgres (OLTP), Redis (queues + cache), and pixel ingest all migrated off managed cloud onto the same Hetzner box earlier this month (May 11–12, 2026). ClickHouse is the last managed dependency. Cancelling the Tinybird subscription cleanly closes the per-query-billing chapter and makes our cost model predictable at every customer-count target.

The Triple Whale and Datafast playbooks both self-host their warehouse layer at scale — the patterns are well-trodden and the cost curve at our growth point clearly favours owning the box.

What changes for you — right now: nothing

This is the most important section. Every Admaxxer dashboard you load today still reads its numbers from Tinybird. The migration is happening behind a feature flag that defaults OFF for every workspace. We only flip your flag on after our admin team has verified that the ClickHouse path returns numbers that match Tinybird’s within ≤1% on a per-column basis for at least 14 days running.

When your workspace flips:

Same API shape. The response envelope your dashboard reads (/api/v1/analytics/summary, /api/v1/analytics/attribution, etc.) is byte-identical across both backends. Every field name, every type, every nullable shape stays the same.
Same numbers. The parity gate enforces ≤1% drift per column over the burn-in. Anything that drifts triggers an admin alert and your flag stays off.
Faster page loads. 0.8–1.6 ms per query instead of 30–50 ms. The most visible effect: the dashboard hero loads ~half a second faster.
Tracking data badge stays consistent. The "data source" indicator on tiles still says where each number came from (pixel / Shopify / platform); the warehouse swap is invisible at the tile level.

If anything looks off the day your flag flips, mention it to support — we can flip it back to Tinybird instantly while we investigate. The rollback is one column update on our admin DB; no deploy needed.

How parity verification works

Every numeric column emitted by summary_kpis (the pipe that powers your dashboard’s 80+ KPIs — MER, ROAS, AOV, NC-ROAS, NCPA, units sold, etc.) is computed on BOTH Tinybird and ClickHouse for your workspace over identical date windows. A nightly canary script (scripts/clickhouse-verify-summary-kpis-parity.ts) runs the comparison and emits a per-column drift report:

Status ok — drift < 1% on a non-zero column. Counts toward your burn-in clock.
Status drift — drift > 1%. Resets your burn-in clock to day 0; admin investigates the source of drift before resuming.
Status currency_pending — your workspace reports in a non-USD currency (PKR, EUR, INR, etc.). The current ClickHouse path returns "upstream_unavailable" for non-USD workspaces until the FX-conversion pipe migrates in a follow-up phase. Tinybird stays canonical and your numbers are correct; the parity check just doesn’t produce a meaningful comparison until FX lands on the new side.

The burn-in requirement is 14 consecutive days of ok across every column before flag-on. This is intentionally conservative — the polled-fallback path that fills install-day workspaces' tiles (GL#313 + GL#319 + GL#327 in our key-lessons archive) takes several days to accumulate enough data on both sides to compare meaningfully, so we want a window that covers it.

Per-workspace cutover schedule

Cutover happens workspace-by-workspace, not all at once. The order is:

Cohort 1: USD DTC workspaces with full data history (≥ 90 days). The most stable parity profile — both sides have months of overlapping data to compare. Flag-on candidates start landing here ~14 days after the dual-write started for your workspace.
Cohort 2: USD DTC workspaces with install-day or early-stage data. Parity hinges on the polled-fallback fold; we want extra runway to confirm the ClickHouse translation preserves it.
Cohort 3: Non-USD workspaces. Blocked on the FX-conversion pipe migration. Once that ships, parity check unlocks and the same 14-day burn-in starts.
Cohort 4: SaaS and Lead-Gen workspaces. These read different pipes (p_saas_summary, etc.) — their CH equivalents land in a separate phase.

You can’t opt your own workspace in or out — the flag is admin-controlled. We’re intentionally not exposing this as a customer-facing toggle because the parity gate is the contract; flipping manually would defeat the gate.

How to verify your numbers (any day, any backend)

Two pieces of advice that apply regardless of which warehouse backs your dashboard:

1. Compare to Shopify Admin’s native reports

Shopify Admin → Analytics → Reports has authoritative revenue and orders numbers straight from Shopify’s order edge. Pick a 30-day window in both Admaxxer and Shopify Admin — pixel-attributed revenue on Admaxxer should land within ~5% of Shopify’s gross sales (the gap is real attribution loss: ad-blockers, Safari ITP, CSP-filtered events). For polled metrics (orders, NC orders, units sold), the numbers should be within ~1% — we read those directly from Shopify Admin via the daily-poll fallback, so they’re effectively the same source.

2. Compare to Meta Ads Manager and Google Ads

Pull spend / impressions / clicks for the same date range from each platform’s native reporting. Admaxxer’s spend numbers should match to the cent — we read directly from the Marketing API daily and store the response unchanged. Purchase / conversion counts will differ slightly because each platform has its own attribution model (default 7-day click + 1-day view for Meta, etc.); the spend is the ground-truth comparison.

If you see discrepancies that don’t match these ranges, file a support ticket with the specific tile + date window. We’ll either explain the gap (often it’s currency or attribution-window mismatch, both fixable) or root-cause it as a real bug. Either way, the warehouse swap doesn’t change this calculus — the parity gate ensures our numbers stay correct across the cutover.

Security and data isolation in the new setup

Same model as before. The ClickHouse cluster:

Runs in a Docker container (admaxxer-ch) on a dedicated Hetzner CCX23 box (Hillsboro, OR, US-West region).
Listens only on the private LAN interface (192.168.100.3:8123); the public IPv4 returns connection-refused. Verified externally — the analytics warehouse is not reachable from the public internet.
Speaks to the Admaxxer app server over a Hetzner Private Network. No internet traffic crosses the warehouse boundary.
Authenticates with a 128-bit random password; SSL between app and warehouse is end-to-end private LAN (no internet hop).
Per-workspace data isolation is enforced the same way as Tinybird: every read query carries WHERE workspace_id = {workspace_id:String} bound from the authenticated session. No user-supplied workspace_id path exists.
Daily backup runs at 03:20 UTC; 14 daily snapshots + 12 monthly snapshots retained on a separate disk path. Restore is a single RESTORE DATABASE command.

For full architecture details see Tinybird auth model (the patterns generalize) and our public infrastructure reference in the codebase.

Where to learn more

If you have questions about your workspace’s migration status or want to know when your flag is scheduled to flip, reach out to support@admaxxer.com.

Phase 2D update (2026-05-17) — attribution_breakdown migrated