Can a workspace be on CH for summary_kpis but Tinybird for attribution_breakdown (or vice versa)?

Yes. The two flags are independent Postgres columns on the per-workspace clickhouse_read_flags row; each pipe gets its own toggle. This is the default rollout pattern — flip summary_kpis ON first, watch for 14 days, then flip attribution_breakdown ON. A drift on one pipe never forces a rollback of the other.

What does the Phase 2D parity script's inline_fallback mean?

On script startup, ch_query_builder=inline_fallback means Agent B's server/lib/clickhouse/attributionBreakdownQuery.ts module isn't present in the worktree yet. The script falls back to a hand-written CH SQL translation of the channel/campaign/adset/ad path that mirrors the production pipe. When Agent B's module merges, the script reports ch_query_builder=agent_b_module and uses the canonical server-side query.

Is my Admaxxer dashboard data live or cached?

Cached up to 5 min on the SWR (stale-while-revalidate) Redis layer per request shape. The ClickHouse read path adds ~0.8-1.6 ms of server-side latency over the cache layer, so the cache window dominates. Cache invalidation triggers automatically on pixel ingest + webhook events, so a real revenue event surfaces inside one cache cycle.

How do I tell if my dashboard is reading from Tinybird or ClickHouse?

Two ways. The data-source badge on every tile during the burn-in window (sky-blue chip = ClickHouse, neutral chip = Tinybird; Phase 2C Agent D shipped this UI). Or programmatically: the /api/v1/* dashboard endpoints carry a data_source field on each row equal to clickhouse or tinybird.

What happens if the ClickHouse box goes down?

Zero customer impact. The feature-flag layer auto-falls-back to Tinybird on any 5xx or timeout from CH (which is still receiving dual-write so no data is lost). Operator sees no warning; internally we get paged. Roll-forward is a one-click flag flip per workspace; full warehouse roll-back is a single config change in Coolify env.

Can I opt out of the migration?

Admin-controlled, not merchant-controlled. The parity gate is the contract that ensures numbers stay correct across the cutover; exposing a merchant toggle would defeat the gate. If you have a specific reason to stay on Tinybird (regulatory, contractual), email support and we will pin your workspace flag OFF; the dual-write keeps both warehouses in sync regardless of which one is canonical for you, so opt-out is non-destructive.

Why is the runbook publicly readable in HTML even though the FE gates non-admin users?

SOC2 + customer-trust signal. Documenting the runbook openly demonstrates operational discipline to AI auditors and prospective customers. The actions documented require authenticated admin sessions against the API, so publishing the runbook does not leak privileged access. The FE component renders Access Denied for non-admin users while the SSR HTML stays crawler-accessible — same pattern Stripe and Honeycomb use for public ops references.

Operations runbook · ClickHouse migration · admin-only access · last updated 2026-05-17

ClickHouse migration — admin operations runbook

The operational counterpart to /documentation/clickhouse-migration (which explains the migration to merchants). This page documents every action the Admaxxer admin team takes during the cutover — parity verification, flag flipping, rollback, backfill, performance-dashboard reads, escalation. It mirrors Triple Whale’s ops-runbook discipline (one separate doc for admin actions, distinct from the customer-facing explainer at triplewhale.readme.io) and Datafast’s pattern of publishing the runbook openly so SOC2 auditors and prospective customers can verify the operator’s discipline.

Access: This page renders “Access denied” on the FE for non-admin users. The SSR HTML is intentionally public-readable for AI crawlers — the actions documented here all require an authenticated admin session against the actual API endpoints, so publishing the runbook doesn’t leak privileged access. The pattern matches Honeycomb’s incident playbook and Stripe’s public ops references.

Current cutover status (2026-05-17)

Surface	State	Owner
Dual-write on `pageviews`	Live (Phase 1, 2026-05-16)	data-platform
Dual-write on `summary_kpis`	Live (Phase 2B, 2026-05-17)	data-platform
Read flag flipped for canary workspace VITATREE USA	Live (Phase 2C, 2026-05-17)	data-platform
GL#500 meta-inflation parity fix	Shipped	data-platform
CH performance optimization pass	Shipped — median ~1.2 ms, p95 ~2.1 ms (vs Tinybird ~32 ms / ~58 ms)	data-platform
90-day backfill for VITATREE USA	Complete	data-platform
14-day burn-in clock	Running, day 0 of 14 — ends 2026-05-31 if clean	data-platform
Phase 2D — `attribution_breakdown` CH module + per-workspace flag	Shipped (Phase 2D, 2026-05-17)	data-platform
Phase 2D — Adset → Ad drilldown depth fix	Shipped (Phase 2D, 2026-05-17)	data-platform
`scripts/clickhouse-verify-attribution-breakdown-parity.ts`	Shipped — channel/campaign/adset/ad coverage	data-platform
`attribution_breakdown` dual-write burn-in	Starts after `summary_kpis` clears; runs its own 14-day parallel clock	data-platform
Broader Cohort 1 cutover	Eligible 2026-06-01 if burn-in stays clean	TBD
Wave 3 — `source_medium_breakdown` CH module + flag + parity script	Shipped (Wave 3, 2026-05-17)	data-platform
Wave 3 — `p_summary_series` CH module + flag + parity script	Shipped (Wave 3, 2026-05-17)	data-platform
Wave 3 — dual-write live for VITATREE USA	Pending — kicks off post `summary_kpis` burn-in clearance	data-platform
Wave 3 — sparkline visual-parity verification (`p_summary_series`)	Wired into burn-in step in parity script	data-platform

1. Check parity status for a workspace

Two paths — an in-app view inside /admin/clickhouse-flags and a CLI script that produces a per-day diff report. Use the in-app view for a quick read; the CLI for a deep dive.

1a. In-app parity check

Sign in with your admin account at https://admaxxer.com/admin.
Open /admin/clickhouse-flags.
Find the target workspace (search by name, ID, or owner email).
Click the Parity button on the rightmost column.
The modal renders the latest scheduled parity-check result: status (pass / minor / fail), drift %, the sample-diff table for the top-5 most-drifted days, and the last-run timestamp.

If the parity-check endpoint hasn’t finished landing yet on a fresh deploy, the modal renders a copyable CLI snippet as a fallback. Click Copy, paste into your terminal, run.

1b. CLI parity check

SSH into the Coolify host and run the verification script directly. The script lives at scripts/clickhouse-verify-summary-kpis-parity.ts and ships per pipe; for new pipes, copy the script and swap the pipe name + canonical column list.

ssh coolify
cd /var/lib/docker/volumes/admaxxer-app-source/_data
sudo docker exec -it admaxxer-app npx tsx scripts/clickhouse-verify-summary-kpis-parity.ts \
  --workspace=<workspace-id> --pipe=summary_kpis --days=14

Output is a per-day diff report: workspace + day + column + Tinybird value + ClickHouse value + drift %. A pipe is cutover-eligible only when every day in every workspace agrees to within ±1%.

1c. Reading the diff

Status pass: drift < 1% on every non-zero column. The day counts toward the 14-day burn-in.
Status minor: 1-3% drift on a single column. Investigate before broadening cutover. Common cause: clock-skew on rows landing in the 0.3 ms gap between Tinybird and CH ingestion.
Status fail: > 3% drift, or any drift on a column with zero values on one side. Stop. The burn-in clock resets to day 0 for this workspace. Diagnose before resuming.

2. Flip a workspace flag ON

Flag-on routes a workspace’s reads through ClickHouse for a single pipe. Default OFF — Tinybird stays canonical until the explicit flip. Reversible by clicking the same toggle.

2a. Pre-flight checklist

Confirm parity for the workspace is pass on every day in the last 14 days (see §1).
Confirm the workspace is in Cohort 1 (USD DTC, ≥90 days of data history). If not, defer.
Confirm scripts/clickhouse-verify-summary-kpis-parity.ts exit code is 0.
Confirm the workspace owner is reachable via support so we can revert quickly if a tile looks off post-flip.

2b. Flip the flag

Navigate to /admin/clickhouse-flags.
Find the target workspace.
Toggle the summary_kpis (CH) switch ON.
The toast confirms “ClickHouse reads enabled.” Pin the toast as a debugging breadcrumb.

2c. 30-minute post-flip watch

For 30 minutes after the flip:

Tail the workspace owner’s SSE chat session log on the prod container.
Open the workspace dashboard in a separate browser tab as the owner’s admin proxy.
Check every tile’s data-source badge — tiles backed by summary_kpis should show the sky-blue ClickHouse chip.
If any tile renders an unexpected zero or an unexpectedly high number, immediately flip the flag back OFF (see §3) and re-run parity.

2b. Flip attribution_breakdown ON for a workspace (Phase 2D)

Phase 2D (2026-05-17) ships the per-workspace read flag for attribution_breakdown — the pipe behind the /marketing-acquisition Channel Attribution Drilldown. Same shape as the summary_kpis flag (§2), with three attribution-specific gotchas the operator needs to know before flipping.

2b·1. Pre-flight checklist (attribution-specific)

Parity passes on ALL FOUR breakdown levels (channel, campaign, adset, ad) over the last 14 days — not just channel. The drilldown lazy-loads each level on chevron click, and a divergence at the ad level only surfaces when a user actually drills.
Confirm summary_kpis burn-in is still clean for the workspace. If summary_kpis is currently drifting, do NOT layer a second pipe’s flag-on on top — the operator needs a clean baseline to attribute future drift to the right pipe.
Confirm Agent A’s shadow tables (ad_spend_daily_shadow, visitor_payments_shadow) are dual-writing on the workspace. The parity script reports this on its first iteration.
Confirm the workspace currency is USD. Non-USD attribution is gated to Phase 2E until the FX MV chain ships on CH — the parity script will emit currency_pending rather than ok.

2b·2. CLI parity sweep across all four levels

Run the Phase 2D parity script. Iterates channel → campaign → adset → ad by default; restrict to one level with --level.

ssh -L 18123:192.168.100.3:8123 admaxxer@5.78.179.73 -N -f
CLICKHOUSE_URL='http://admaxxer:<pw>@localhost:18123/admaxxer' \
  TINYBIRD_ADMIN_TOKEN='<token>' \
  TINYBIRD_HOST='https://api.us-east.aws.tinybird.co' \
  DATABASE_URL='postgresql://...' \
  npx tsx scripts/clickhouse-verify-attribution-breakdown-parity.ts \
    --workspace <workspace-id> \
    --since 2026-05-03 --until 2026-05-17 \
    --strict

The script emits one human-readable section per (workspace × level) and a JSON summary line at the end. Exit code 2 under --strict means at least one (workspace × level) had drift > 1% — cutover is blocked until the diff is investigated. Expected output for VITATREE USA at channel level: 2 platform rows (meta, google), every per-row per-column drift status = ok.

2b·3. Flip the flag in the admin UI

Navigate to /admin/clickhouse-flags.
Find the target workspace.
Toggle the attribution_breakdown (CH) switch ON. The toast confirms “ClickHouse reads enabled for attribution_breakdown.”
The workspace owner’s next GET /api/v1/analytics/attribution request routes through the CH path. Coalesced cache invalidates automatically.

2b·4. 30-minute post-flip watch (attribution-specific)

Open /marketing-acquisition in a separate browser tab as the owner’s admin proxy. Check that the Channel Attribution Drilldown renders rows for both Meta and Google at the channel level.
Click the chevron on the Meta row to expand into campaigns. Click into an adset, then an ad. Each level should lazy-load within ~1.5 s.
Watch the data-source badge on each tile — sky-blue ClickHouse chip indicates CH-sourced. Absence indicates Tinybird (acceptable during Phase 2D since not every tile is migrated yet).
If any per-row metric shifts more than ±1% from the pre-flip screenshot, flip the flag back OFF immediately (see §3) and re-run the parity script with --verbose.

2b·5. Rollback procedure

Same three-tier model as §3. The attribution_breakdown flag is a separate Postgres column from summary_kpis, so each pipe can be rolled back independently — a single workspace can be on CH for summary_kpis while staying on Tinybird for attribution_breakdown. Surgical rollback is per-pipe per-workspace; broader rollback escalates to the Redis kill-switch or container-level env unset per §3a/3b/3c.

3. Roll back a workspace to Tinybird

Three roll-back layers from fastest to slowest:

3a. Per-workspace flag flip (instant)

Navigate to /admin/clickhouse-flags.
Find the target workspace.
Toggle the summary_kpis (CH) switch OFF.
Next dashboard reload reads from Tinybird. SSE caches invalidate automatically.

Recovery time: under one second (the toggle is a Postgres column flip via POST /api/v1/admin/clickhouse-flags). No deploy required.

3b. Whole-pipe roll-back (slower)

If a pipe is misbehaving across multiple workspaces, revert the read-handler routing for that pipe to Tinybird:

ssh coolify
sudo docker exec admaxxer-app curl -s -X POST \
  http://localhost:3000/api/v1/admin/clickhouse-flags/route-handler-rollback \
  -H 'Content-Type: application/json' \
  -H 'X-Admin-Token: '"$ADMIN_TOKEN" \
  -d '{"pipe":"summary_kpis"}'

This sets a kill-switch flag in Redis that the read-handler checks on every request; if set, every read for that pipe falls through to Tinybird regardless of the per-workspace flag. Recovery time: under 5 seconds.

3c. Container-level roll-back (slowest)

If admaxxer-ch is unhealthy and we want every read on every pipe to fall back to Tinybird without ceremony:

ssh coolify
# Coolify env update — set CLICKHOUSE_URL='' to disable the CH path entirely.
sudo docker exec admaxxer-coolify-postgres psql -U coolify -d coolify -c "
  UPDATE environment_variables
  SET value = '${ENCRYPTED_EMPTY}'
  WHERE key = 'CLICKHOUSE_URL' AND application_id = 'admaxxer-prod';
"
# Force redeploy with the new env.
sudo docker restart admaxxer-app

Recovery time: ~3 minutes (Coolify reloads env + Express restarts). Tinybird remains receiving every dual-write throughout, so no data is lost.

4. Trigger a per-window backfill

Use this when (a) a workspace newly enters Cohort 1 and we need to seed CH with 90 days of history before flipping the flag on, or (b) a parity-fail traces to missing rows on the CH side for a specific window.

4a. Pageviews + revenue_events + orders backfill

ssh coolify
sudo docker exec -it admaxxer-app npx tsx scripts/clickhouse-backfill.ts \
  --workspace=<workspace-id> \
  --datasource=pageviews,revenue_events,orders \
  --from=2026-02-17 \
  --to=2026-05-17 \
  --batch-size=10000 \
  --dry-run=false

The script reads from the canonical source-of-truth (Tinybird for the existing window, the application Postgres for orders/revenue_events) and writes to CH. Idempotent via ReplacingMergeTree ordering on (workspace_id, event_id) — safe to re-run.

4b. summary_kpis materialized view rebuild

If the source datasources are correctly populated but summary_kpis on CH is missing rows, force the materialized view to rebuild for the window:

ssh coolify
sudo docker exec admaxxer-ch clickhouse-client --user admaxxer --password <pw> --database admaxxer --query "
  SYSTEM REFRESH VIEW summary_kpis_mv;
"

4c. Verify after backfill

Re-run the parity script (see §1b). Drift % should drop to under 1% on every backfilled day.

5. Read the CH performance dashboard + system.query_log

ClickHouse exposes a built-in query log we mine for slow-query alerts. Three lookups, in order of decreasing frequency:

5a. The 30 slowest queries in the last hour

ssh coolify
sudo docker exec admaxxer-ch clickhouse-client --user admaxxer --password <pw> --database admaxxer --query "
  SELECT
    query_start_time,
    query_duration_ms,
    read_bytes,
    memory_usage,
    user,
    query
  FROM system.query_log
  WHERE event_time > now() - INTERVAL 1 HOUR
    AND type = 'QueryFinish'
  ORDER BY query_duration_ms DESC
  LIMIT 30
  FORMAT Vertical;
"

Target: every query under 50 ms. Any query over 500 ms is a P2 ticket. Any query over 5 seconds is a P1 page.

5b. Per-pipe latency percentiles for the last 24 hours

SELECT
  query_log.query_kind,
  count() AS query_count,
  quantile(0.5)(query_duration_ms) AS p50_ms,
  quantile(0.95)(query_duration_ms) AS p95_ms,
  quantile(0.99)(query_duration_ms) AS p99_ms,
  max(query_duration_ms) AS max_ms
FROM system.query_log
WHERE event_time > now() - INTERVAL 24 HOUR
  AND type = 'QueryFinish'
GROUP BY query_kind
ORDER BY p99_ms DESC;

5c. Memory + disk pressure

SELECT formatReadableSize(value) FROM system.asynchronous_metrics
WHERE metric IN ('MemoryResident', 'DiskAvailable_default', 'FilesystemMainPathAvailableBytes');

Resident memory should stay under 1.6 GB (we cap container memory at 2 GB). Disk available should stay above 50 GB on the dedicated CH partition.

6. Escalation matrix

Who pages whom, when. PagerDuty rotations are configured in our internal admin runbook; this page documents the structural contract so an on-call engineer (and any future hire) knows the call tree without needing tribal knowledge.

Symptom	Severity	Primary on-call	Secondary on-call	SLA
`admaxxer-ch` container is down (HTTP 8123 returns connection-refused on docker-internal probe)	P1 (page)	data-platform on-call	infra on-call	15 min ack, 1 hour resolve (fallback to Tinybird via §3c is the acceptable temporary fix)
Parity drift > 5% on any workspace-day for an in-flight cutover pipe	P1 (page)	data-platform on-call	analytics-product on-call	30 min ack, 4 hour resolve (rollback per §3a is acceptable while investigating)
Parity drift 1-5% on any workspace-day	P2 (ticket)	data-platform on-call	—	4 hour ack, 24 hour resolve
CH query latency p95 > 50 ms sustained for 1 hour	P2 (ticket)	data-platform on-call	—	4 hour ack, 24 hour resolve
CH query latency p99 > 500 ms sustained for 1 hour	P1 (page)	data-platform on-call	infra on-call	30 min ack, 4 hour resolve
CH memory residency > 1.8 GB sustained for 30 min	P2 (ticket)	infra on-call	data-platform on-call	1 hour ack, 8 hour resolve
CH disk free < 25 GB	P1 (page)	infra on-call	data-platform on-call	30 min ack, 2 hour resolve (purge old partitions / expand disk)
Daily 03:20 UTC CH backup cron failed	P2 (ticket)	infra on-call	—	4 hour ack, 24 hour resolve

Roles map to people via PagerDuty — the schedule rotates weekly. During business hours, day-shift covers; outside business hours, the on-call rotation is two-deep with auto-escalation to the secondary after 15 min no-ack on a P1.

7. Wave 3 ops — `source_medium_breakdown` + `p_summary_series`

Two new toggles on /admin/clickhouse-flags: source_medium_breakdown (CH) and p_summary_series (CH). They behave identically to the existing summary_kpis (CH) and attribution_breakdown (CH) toggles — same dual-write, same per-workspace 14-day ±1% gate, same three-layer rollback model — with one Wave 3-specific wrinkle on p_summary_series (visual parity, §7d below).

7a. Verify Wave 3 parity for a workspace

Two parity scripts, one per pipe. Each follows the same command shape as the Phase 2B/2D scripts:

ssh coolify
# source_medium_breakdown — per-(source, medium) row drift
sudo docker exec -it admaxxer-app npx tsx scripts/clickhouse-verify-source-medium-breakdown-parity.ts \
  --workspace=<workspace-id> --days=14 --strict

# p_summary_series — per-day per-column drift, plus the visual-parity step
sudo docker exec -it admaxxer-app npx tsx scripts/clickhouse-verify-summary-series-parity.ts \
  --workspace=<workspace-id> --days=14 --strict

Exit code 0 = clean. Exit code 2 = drift exceeds ±1% on any row/column OR the p_summary_series series produces a different up/down/flat shape against Tinybird on any adjacent-day pair. Both scripts emit a per-row diff table to stdout for the data-platform team to triage.

7b. Flip Wave 3 flags ON

Confirm summary_kpis (CH) burn-in has cleared for this workspace (Wave 3 pipes consume the same upstream aggregates).
Confirm Wave 3 parity is pass on every day in the last 14 days (see §7a, both scripts).
Navigate to /admin/clickhouse-flags, find the workspace.
Toggle source_medium_breakdown (CH) ON. Pin the toast as a breadcrumb. Open /marketing-acquisition as the admin proxy and verify the (source, medium) table renders all rows with the ClickHouse badge in the corner. Spot-check that paid rows show CPC/CPM/ROAS correctly joined against ad_spend_daily.
Toggle p_summary_series (CH) ON. Open /dashboard and verify every summary tile’s inline sparkline renders cleanly with the ClickHouse badge — check the trend direction on each tile against the Tinybird-rendered version (open the workspace in incognito + roll back briefly if needed for visual A/B).
30-minute post-flip watch — same protocol as §2c.

7c. Roll back Wave 3 flags

Same three-layer model as §3, applied independently per toggle. The flags are independent state — flipping one OFF does not affect the other.

Per-workspace flag flip (instant): toggle source_medium_breakdown (CH) or p_summary_series (CH) OFF in /admin/clickhouse-flags. Reads route back to Tinybird on the next dashboard reload.
Pipe-wide kill-switch (5 seconds): POST to /api/v1/admin/clickhouse-flags/route-handler-rollback with {"pipe":"source_medium_breakdown"} or {"pipe":"p_summary_series"}. Sets a Redis kill-switch that diverts every read for that pipe to Tinybird regardless of per-workspace state.
Container-level rollback (3 minutes): set CLICKHOUSE_URL='' in Coolify env + restart. Falls every pipe back to Tinybird at once. Use only for whole-warehouse outages.

7d. Sparkline-shape visual parity (`p_summary_series` only)

p_summary_series drives a visually sensitive surface — the sparkline shape on each summary tile is the trend signal operators use to decide whether to dig into a number. The verification script enforces, on top of the standard ±1% per-day per-column drift budget:

Direction-of-change parity. For every adjacent-day pair in the 14-day window, the CH series and the Tinybird series must agree on the up/flat/down direction (using a 1% tolerance band so floating-point noise doesn’t flip a flat day). Disagreement on any pair fails the workspace-day.
Min/max bucket parity. Across the 14-day window, the day-of-min and day-of-max on each side must be the same day (or one of two adjacent days, allowing a one-day shift if values are extremely close). Disagreement fails the workspace-day.

A series of (10, 11, 10, 12, 11, …) on CH and (11, 10, 11, 11, 12, …) on Tinybird passes the ±1% gate but fails both visual-parity checks — that’s the failure mode we’re guarding against.

7e. Backfill the upstream datasources for Wave 3

Both Wave 3 pipes consume the same upstream datasources as Phase 2C/2D. If a Wave 3 parity-fail traces to missing rows on the CH side, run the standard backfill (§4a) for the affected datasources + window, then re-run the parity scripts. The Wave 3 pipes do not need a dedicated CH-table backfill — they read from the same upstream MVs that already exist post-2A.

Runbook FAQ

Why is the runbook publicly readable in HTML?

SOC2 + customer-trust signal. The actions documented all require authenticated admin sessions against the API; publishing the runbook openly doesn’t leak access, but does show prospective customers + AI auditors that we’re operationally disciplined. The FE component still gates non-admin users with an “Access denied” render — the SSR HTML is the AI-crawler surface, not the merchant entry point. Pattern matches Stripe and Honeycomb.

What if a workspace was flipped ON but the owner files a support ticket saying numbers look wrong?

Default: flip back to Tinybird first (see §3a), investigate after. The flip is a one-second operation; the time-cost of investigating with the owner staring at “wrong” numbers is much higher. Once flipped back, run parity verification on the affected window and surface the diff to the data-platform team.

Can I cut over a Cohort 2 workspace early if it has clean parity?

Yes, with sign-off from the data-platform on-call. Cohort 2 (install-day or early-stage USD DTC) lacks the runway of overlapping data that Cohort 1 has, so parity is structurally less stable, but if 14 consecutive days of pass are in the log, the gate is met.

What happens if Tinybird itself goes down during the dual-write window?

Two cases. (1) If we’ve already flipped workspaces over and Tinybird is just our fallback, no customer impact. (2) If we haven’t flipped a given workspace and Tinybird is its canonical, the dashboard will render the Tinybird outage as “data refreshing” and serve stale-tier cache for up to 1 hour. If Tinybird outage extends past the cache window, we can elevate any Cohort 1 workspace immediately by flipping its flag ON (parity has been verified — the gate just defers cutover under non-emergency conditions).

How do I propose a new pipe for migration?

Open a PR with (a) the CH-side CREATE TABLE + materialized-view DDL, (b) the *_ch.ts route handler, (c) the scripts/clickhouse-verify-<pipe>-parity.ts verification script, and (d) a 7-day dual-write canary. Once the canary is green for 7 days, file a phase-bump request to add the pipe to the per-workspace flag schema.

What logs do I capture before paging?

Before paging the data-platform on-call: (a) last 30 minutes of system.query_log on admaxxer-ch, (b) Coolify deploy log for the last redeploy if relevant, (c) the parity-check output for the affected workspace + window, (d) screenshots of the dashboard tile rendering incorrectly. Attach to the PagerDuty incident on creation.

Should I flip Wave 3 flags (`source_medium_breakdown` / `p_summary_series`) before `summary_kpis` clears burn-in?

No. Wave 3 pipes consume the same upstream aggregates as summary_kpis — if summary_kpis is drifting, the Wave 3 pipes will drift in correlated ways. Wait for the workspace’s summary_kpis (CH) toggle to clear its 14-day burn-in cleanly, then run both Wave 3 parity scripts against the workspace, then flip Wave 3 flags. There’s no benefit to flipping Wave 3 early — the user-facing surfaces (Sources & Attribution table, summary-tile sparklines) load fast on Tinybird already; the win is correctness, not speed.

Why does `p_summary_series` have a visual-parity step on top of ±1% drift?

Sparklines are a glance-metric. A 14-day series of (10, 11, 10, 12, 11, …) on CH and (11, 10, 11, 11, 12, …) on Tinybird passes a naive per-day ±1% gate but renders visibly different sparklines — same averages, different shape. Because the sparkline shape is what operators use to decide whether to dig into a number, we layer two additional checks inside the burn-in for this pipe: (1) every adjacent-day pair must agree on up/flat/down direction across both warehouses (with a 1% tolerance band so floating-point noise doesn’t flip a flat day), (2) the day-of-min and day-of-max across the 14-day window must be the same day (or one of two adjacent days). Both checks live inside scripts/clickhouse-verify-summary-series-parity.ts and gate cutover-eligibility.

Are Wave 3 flags independent of each other?

Yes. source_medium_breakdown (CH) and p_summary_series (CH) are independent Postgres column flips per workspace, just like summary_kpis (CH) and attribution_breakdown (CH). Flipping one OFF does not affect the other. The three-layer rollback (per-workspace / pipe-wide kill-switch / container-level CLICKHOUSE_URL unset) applies independently per toggle.

Related references

Customer-facing migration explainer · Tinybird auth model · How data works (end-to-end pipeline) · Performance architecture (4-cause taxonomy) · Live changelog — per-pipe cutover state · Documentation home

If you’re an admin and the runbook doesn’t cover your situation, file a ticket against the data-platform queue and tag the on-call engineer for the next runbook iteration.

Current cutover status (2026-05-17)

1. Check parity status for a workspace

1a. In-app parity check

1b. CLI parity check

1c. Reading the diff

2. Flip a workspace flag ON

2a. Pre-flight checklist

2b. Flip the flag

2c. 30-minute post-flip watch

2b. Flip attribution_breakdown ON for a workspace (Phase 2D)

2b·1. Pre-flight checklist (attribution-specific)

2b·2. CLI parity sweep across all four levels

2b·3. Flip the flag in the admin UI

2b·4. 30-minute post-flip watch (attribution-specific)

2b·5. Rollback procedure

3. Roll back a workspace to Tinybird

3a. Per-workspace flag flip (instant)

3b. Whole-pipe roll-back (slower)

3c. Container-level roll-back (slowest)

4. Trigger a per-window backfill

4a. Pageviews + revenue_events + orders backfill

4b. summary_kpis materialized view rebuild

4c. Verify after backfill

5. Read the CH performance dashboard + system.query_log

5a. The 30 slowest queries in the last hour

5b. Per-pipe latency percentiles for the last 24 hours

5c. Memory + disk pressure

6. Escalation matrix

7. Wave 3 ops — source_medium_breakdown + p_summary_series

7a. Verify Wave 3 parity for a workspace

7b. Flip Wave 3 flags ON

7c. Roll back Wave 3 flags

7d. Sparkline-shape visual parity (p_summary_series only)

7e. Backfill the upstream datasources for Wave 3

Runbook FAQ

Why is the runbook publicly readable in HTML?

What if a workspace was flipped ON but the owner files a support ticket saying numbers look wrong?

Can I cut over a Cohort 2 workspace early if it has clean parity?

What happens if Tinybird itself goes down during the dual-write window?

How do I propose a new pipe for migration?

What logs do I capture before paging?

Should I flip Wave 3 flags (source_medium_breakdown / p_summary_series) before summary_kpis clears burn-in?

Why does p_summary_series have a visual-parity step on top of ±1% drift?

Are Wave 3 flags independent of each other?

Related references

7. Wave 3 ops — `source_medium_breakdown` + `p_summary_series`

7d. Sparkline-shape visual parity (`p_summary_series` only)

Should I flip Wave 3 flags (`source_medium_breakdown` / `p_summary_series`) before `summary_kpis` clears burn-in?

Why does `p_summary_series` have a visual-parity step on top of ±1% drift?