How Data Works in Admaxxer — End-to-end DTC Data Flow

A walkthrough of the full Admaxxer data stack — how a Meta ad click on Tuesday becomes a cohort-LTV chart by Friday, with every transformation step in between.

TL;DR: Admaxxer ingests five source classes (first-party pixel, revenue webhooks, ad-platform APIs, Klaviyo, Google Search Console) through three ingest layers (browser POSTs, Express webhook handlers, BullMQ poll workers). Hot, append-only event data lives in Tinybird (managed ClickHouse, ~50ms p95 reads). Workspace, user, and billing state lives in Neon Postgres. Every dashboard card, API response, and Claude agent answer reads through a curated PIPE_ALLOWLIST — nothing queries Tinybird without going through a vetted pipe. The two Claude agents share the same internal /api/v1/* surface as the UI: anything a human can see, the agent can see; anything destructive (pause, scale, launch) requires explicit confirmed: true from the user.

The data flow at a glance

Five source classes, three ingest layers, two storage tiers, one query layer, three presentation surfaces:

SOURCES                       INGEST                          STORAGE           QUERY          PRESENTATION
-------                       ------                          -------           -----          ------------
1. First-party pixel    -->   POST /api/event                 Tinybird          PIPE_ALLOWLIST  /dashboard/*
   (visit, goal, paymt)       (browser, server SDK)           pixel_events      (33+ pipes)     /api/v1/*
                                                                                                Claude agents
2. Custom Pixel +       -->   Shopify checkout webhook   -->  Tinybird
   Shopify orders             POST /api/event (server)        visitor_payments

3. Revenue webhooks     -->   Express webhook handlers   -->  Tinybird
   (Stripe, Paddle,           server/webhookHandlers.ts       visitor_payments
    LS, Polar, Dodo,
    Woo, Shopify)

4. Ad-platform APIs     -->   BullMQ poll workers        -->  Tinybird
   (Meta, Google,             server/queues/                  ad_insights_*
    TikTok)                   ad_sync_logs (15min cache)

5. Klaviyo + GSC        -->   BullMQ poll + OAuth flow   -->  Tinybird
                                                              klaviyo_events
                                                              gsc_*

Workspace + billing     -->   Drizzle ORM                -->  Neon Postgres   (direct ORM      same surfaces
state                                                         (65 tables)     reads)

Two storage tiers, two access patterns. Tinybird answers analytical questions ("revenue last 30 days by ad set"); Neon Postgres answers transactional questions ("does this user have an active subscription?"). The split is deliberate — running cohort joins against an OLTP database at DTC scale doesn't perform; running session lookups against a columnar store doesn't make sense either.

The three ingest layers in detail

1. The first-party pixel — POST /api/event

The Admaxxer pixel (https://admaxxer.com/js/script.js or the script.plus.js variant) is a ~6KB asynchronous script. On every pageview it fires a single POST /api/event with the page URL, referrer, sticky visitor_id cookie, sticky first-touch UTMs, viewport size, and language. track('goal_name', metadata) fires a second event for any merchant-defined goal. The Custom Pixel build for Shopify catches checkout_completed in the Shopify Web Pixel sandbox and fires a payment event with the order subtotal, currency, and Shopify GID.

The receiving Express handler validates the payload, hashes incoming IP for geo-IP only (never stored raw), strips PII patterns (email, phone, credit-card-shape numbers) from goal metadata, and writes an append-only row into pixel_events (for visits/goals) or visitor_payments (for revenue). RMT keying on (workspace_id, website_id, provider, external_payment_id) guarantees idempotency — a Shopify order that arrives via Custom Pixel and via the orders/create webhook lands once, not twice.

2. Revenue webhooks — provider-signed payment events

Seven revenue providers ship signed webhooks to Admaxxer (Stripe, Paddle, Lemon Squeezy, Polar, Dodo, WooCommerce, Shopify). Each provider has its own handler in server/webhookHandlers.ts — signature verification first (HMAC-SHA256 with the provider's secret), then payload normalization to the canonical visitor_payments shape: amount_cents, currency, provider, external_payment_id, customer_email_hash, occurred_at. The same RMT key applies; reorder-tolerant by design.

Webhooks beat the pixel for trust (signed by the provider, not the browser) and beat the daily polling fallback for latency (sub-second vs. T+24h). When all three sources are configured, the deduplication priority is webhook > pixel > poll. See /documentation/data/revenue-data-flow for the full priority table and the dedup keys per provider.

3. Ad-platform APIs — BullMQ-orchestrated polling

Meta Marketing API, Google Ads API, and TikTok Marketing API are pulled, not pushed. server/queues/ defines seven BullMQ workers backed by Upstash Redis: meta-insights, google-insights, tiktok-insights, token-rotation, klaviyo-sync, gsc-sync, session-cleanup. Each worker dequeues a per-account job, hits the platform API, transforms the response into the canonical ad_insights_* shape (one row per (date, account_id, campaign_id, adset_id, ad_id)), and writes to Tinybird.

Polling cadence is conservative: Meta insights pull every 30 min (well under the documented 200 calls/hour/user-token rate limit), Google Ads insights pull every 60 min (well under the 15k ops/day default). Token rotation runs nightly — Meta long-lived tokens are refreshed 7 days before expiry (GL#258), Google Ads refresh tokens are validated on every dequeued job. Ad-platform API responses are also mirrored into ad_sync_logs in Postgres with a 15-minute TTL for the connection-health UI on /dashboard.

First-party vs third-party data — latency and trust differ

Admaxxer treats the two source classes differently because they have different reliability characteristics:

First-party data

Third-party data

This is why /dashboard/analytics renders pixel revenue and platform-reported revenue side-by-side — you'll see a 5–15% gap most days. The pixel is exact-but-blocked-by-iOS-tracking-prevention; the platform is modeled-but-deduplicated-against-CAPI. Neither is "wrong"; the gap is the iOS attribution artifact.

Cohort and user identity

The most important question in DTC analytics is "is this the same person who clicked the Meta ad two weeks ago?" Admaxxer answers it through three identifiers, in order of trust:

visitor_id — the anonymous primary key

A 24-character random ID generated client-side on the first pageview, stored in a first-party cookie on the merchant's own domain (never admaxxer.com). Default rotation is 24 hours on the client to be GDPR-friendly; merchants on the Pro plan can switch to a 365-day cookie. This survives ITP, but does not survive cookie clearing or device switching.

external_user_id — the cross-device anchor

Set via the admx.identify(user_id) JS API after login, or via the server-side /api/event with an external_user_id in the payload. Once set, the visitor's anonymous events get back-filled with the ID and future events from any device with the same external_user_id (same login) join into the same cohort. This is how mid-funnel signup is stitched: pre-login pageviews (anonymous visitor_id only) are reconciled to post-login activity by a single identify() call at signup completion.

customer_email_hash — the webhook anchor

Revenue webhooks rarely carry the merchant's visitor_id — Stripe doesn't know it, Shopify only knows it if Custom Pixel fired first. To still attribute the payment, the webhook handler hashes the customer email (SHA-256, lowercased, trimmed) and joins it against any identify() event that carried the same email hash. Match rate is typically 70–85% for DTC brands; the unmatched 15–30% land as direct/unattributed revenue (see attribution discrepancies).

The cohort join itself is a Tinybird operation: visitor_payments JOIN visitor_events ON (workspace_id, visitor_id) — or (workspace_id, external_user_id) when the cookie has rotated — or (workspace_id, customer_email_hash) as the third-tier fallback. The JOIN runs at read-time; nothing in storage is denormalized.

Where each metric is computed

Every dashboard tile and Claude agent tool call resolves to one or more Tinybird pipes. The full list lives in the metric glossary; here's the high-signal subset:

MetricTinybird datasourcePipe (or pipe family)Definition
Revenuevisitor_paymentsp_revenue_summarySum of amount_cents, RMT-deduplicated
Visitspixel_eventsp_visits_summaryDistinct visitor_id by day
MER (Marketing Efficiency Ratio)bothsummary_kpisRevenue ÷ ad spend, blended across channels
Cohort LTVvisitor_paymentsp_cohort_ltvFirst-purchase grouped, 7/30/90/180-day windows
Forecastvisitor_paymentsp_forecastOLS + weekly seasonality
MMM (channel contribution)bothp_mmmOLS + geometric adstock
Incrementalitybothp_incrementalityTwo-proportion z-test on paid-vs-organic cohorts
CAPI Match Ratebothp_capi_match(pixel + CAPI) ÷ pixel events for purchase
Ad-level LTVbothp_ads_ltv_7d / _30d / _90dPer-ad first-purchase + N-day window
Conversion funnelpixel_events + visitor_paymentsp_funnel_steps2–5 step ordered cohort with drop-off
Goalspixel_eventsp_goals_summaryCustom + reserved __admx_* goal counts

Pipe names are illustrative — the canonical list (with parameters and SQL) lives in tinybird/pipes/ and the public glossary at /documentation/metric-glossary. The PIPE_ALLOWLIST is a hard server-side gate: a pipe not in the allowlist returns 403 even with a valid auth token.

Why Tinybird (vs ClickHouse / Snowflake / BigQuery)

Five datasources, 33+ pipes, and 8 read-only Claude tools all read through Tinybird. The choice over alternatives is deliberate:

The full Tinybird auth model (admin tokens, scoped JWT tokens, the x-tb-token header convention, what happens when a token leaks) is documented at /documentation/architecture/tinybird-auth.

The Claude AI agents — read vs write paths

Admaxxer ships two Claude agents, both backed by claude-sonnet-4-6 with prompt caching on the system block + tools array. They're built on the same internal /api/v1/* surface as the UI, so anything a human can see, the agent can see — and anything the agent does, an audit log can replay.

Read-only Analytics Chat (⌘J global drawer)

Eight read-only tools, all PIPE_ALLOWLIST-gated: get_revenue_summary, get_cohort_ltv, get_mer, get_mmm_contribution, get_forecast, get_incrementality, get_capi_match, query_metric. Every tool resolves to one Tinybird pipe; nothing writes anywhere. The drawer opens with +J from any page; conversation state lives in chatSessions (per-workspace, prompt-cache-aware).

Destructive-gated Ads Operator

Six tools: four read-only (list_campaigns, get_campaign_insights, get_account_insights, query_metrics) and two destructive-gated (update_campaign, pause_all_low_roas) that require explicit confirmed: true before execution. The agent never auto-confirms; the user must say "yes, pause" (or click the in-chat confirm button) before the Meta or Google API call fires. Every action is mirrored to ad_sync_logs for audit. create_campaign was removed in R1 to keep the destructive surface minimal (GL#295).

Privacy, compliance, and the PII strip

Admaxxer's privacy posture is built on three principles:

The full privacy policy is at /privacy; the security posture (SOC 2 Type 1 in progress, AES-256-GCM at rest, TLS 1.3 in transit) is at /security. CSP setup for the pixel is documented at /documentation/troubleshoot/csp.

Comparison — vs Triple Whale, vs Datafast

The same data architecture question, answered three ways:

DimensionTriple WhaleDatafastAdmaxxer
Source classes ~12 (Shopify, Meta, Google, TikTok, Klaviyo, Recharge, Skio, Postscript, Attentive, Northbeam, Hyros, Looker) ~5 (pixel, Stripe, Paddle, LS, Shopify) 5 (pixel, 7 revenue providers, 3 ad APIs, Klaviyo, GSC)
Ingest model Mostly batch ETL (15–60min) Pixel real-time + webhook real-time Pixel real-time + webhook real-time + ad APIs polled
Storage Snowflake (batch-warehouse) ClickHouse (self-hosted) Tinybird (managed ClickHouse)
Identity scheme Shopify customer + Klaviyo profile + Meta Match visitor_id + external_user_id visitor_id + external_user_id + email_hash
ML enrichment Willy AI (proprietary), MMM (Robyn-style) None Forecast (OLS), MMM (OLS + adstock), Incrementality (z-test), Claude agents (BYOK)
Query layer Internal API; no public SQL Internal API; pixel SDK PIPE_ALLOWLIST + /api/v1/* + Claude tool calls
Privacy posture Third-party cookies via Shopify; no DNT respect First-party only; respects DNT First-party only; respects DNT; PII strip server-side
Cookieless No (Shopify customer ID required) Yes (visitor_id is anonymous) Yes (visitor_id is anonymous; 24h default rotation)
BYOK AI No (Willy is locked) No (no AI agent) Yes (Anthropic, OpenAI, Google, etc.)

See also

Data ontology — the 10 entities · Revenue data flow · Multi-currency display · Tinybird auth model · BYOK Anthropic · Metric glossary · Conversion funnels · Methodology (Q1 2026)