How Data Works in Admaxxer — End-to-end DTC Data Flow

A walkthrough of the full Admaxxer data stack — how a Meta ad click on Tuesday becomes a cohort-LTV chart by Friday, with every transformation step in between.

TL;DR: Admaxxer ingests five source classes (first-party pixel, revenue webhooks, ad-platform APIs, Klaviyo, Google Search Console) through three ingest layers (browser POSTs, Express webhook handlers, managed job-queue poll workers). Hot, append-only event data lives in our first-party analytics warehouse (columnar, ~50ms p95 reads). Workspace, user, and billing state lives in our primary database. Every dashboard card, API response, and Claude agent answer reads through a curated allowlist — nothing queries the warehouse without going through a vetted query. The two Claude agents share the same internal /api/v1/* surface as the UI: anything a human can see, the agent can see; anything destructive (pause, scale, launch) requires explicit confirmed: true from the user.

The data flow at a glance

Five source classes, three ingest layers, two storage tiers, one query layer, three presentation surfaces:

SOURCES                       INGEST                          STORAGE              QUERY          PRESENTATION
-------                       ------                          -------              -----          ------------
1. First-party pixel    -->   POST /api/event                 analytics warehouse  allowlisted    /dashboard/*
   (visit, goal, paymt)       (browser, server SDK)           on-site events       queries        /api/v1/*
                                                                                    (33+ queries)  Claude agents
2. Custom Pixel +       -->   Shopify checkout webhook   -->  analytics warehouse
   Shopify orders             POST /api/event (server)        revenue stream

3. Revenue webhooks     -->   Express webhook handlers   -->  analytics warehouse
   (Stripe, Paddle,           server/webhookHandlers.ts       revenue stream
    LS, Polar, Dodo,
    Woo, Shopify)

4. Ad-platform APIs     -->   managed job-queue workers  -->  analytics warehouse
   (Meta, Google,             server/queues/                  ad-insights stream
    TikTok)                   your sync history (15min cache)

5. Klaviyo + GSC        -->   job-queue poll + OAuth flow -->  analytics warehouse
                                                              email + search streams

Workspace + billing     -->   our typed query layer      -->  primary database     (direct ORM    same surfaces
state                                                         (workspace tables)   reads)

Two storage tiers, two access patterns. The analytics warehouse answers analytical questions ("revenue last 30 days by ad set"); the primary database answers transactional questions ("does this user have an active subscription?"). The split is deliberate — running cohort joins against an OLTP database at DTC scale doesn't perform; running session lookups against a columnar store doesn't make sense either.

The three ingest layers in detail

1. The first-party pixel — POST /api/event

The Admaxxer pixel (https://admaxxer.com/js/script.js or the script.plus.js variant) is a ~6KB asynchronous script. On every pageview it fires a single POST /api/event with the page URL, referrer, sticky visitor_id cookie, sticky first-touch UTMs, viewport size, and language. track('goal_name', metadata) fires a second event for any merchant-defined goal. The Custom Pixel build for Shopify catches checkout_completed in the Shopify Web Pixel sandbox and fires a payment event with the order subtotal, currency, and Shopify GID.

The receiving Express handler validates the payload, hashes incoming IP for geo-IP only (never stored raw), strips PII patterns (email, phone, credit-card-shape numbers) from goal metadata, and writes an append-only row to the on-site event stream (for visits/goals) or the revenue stream (for revenue). Deduplication keying on (workspace_id, website_id, provider, external_payment_id) guarantees idempotency — a Shopify order that arrives via Custom Pixel and via the orders/create webhook lands once, not twice.

2. Revenue webhooks — provider-signed payment events

Seven revenue providers ship signed webhooks to Admaxxer (Stripe, Paddle, Lemon Squeezy, Polar, Dodo, WooCommerce, Shopify). Each provider has its own handler in server/webhookHandlers.ts — signature verification first (HMAC-SHA256 with the provider's secret), then payload normalization to the canonical revenue-stream shape: amount_cents, currency, provider, external_payment_id, customer_email_hash, occurred_at. The same deduplication key applies; reorder-tolerant by design.

Webhooks beat the pixel for trust (signed by the provider, not the browser) and beat the daily polling fallback for latency (sub-second vs. T+24h). When all three sources are configured, the deduplication priority is webhook > pixel > poll. See /documentation/data/revenue-data-flow for the full priority table and the dedup keys per provider.

3. Ad-platform APIs — our job queue-orchestrated polling

Meta Marketing API, Google Ads API, and TikTok Marketing API are pulled, not pushed. server/queues/ defines seven background workers backed by managed job queue: meta-insights, google-insights, tiktok-insights, token-rotation, klaviyo-sync, gsc-sync, session-cleanup. Each worker dequeues a per-account job, hits the platform API, transforms the response into the canonical ad-insights shape (one row per (date, account_id, campaign_id, adset_id, ad_id)), and writes to our analytics warehouse.

Polling cadence is conservative: Meta insights pull every 30 min (well under the documented 200 calls/hour/user-token rate limit), Google Ads insights pull every 60 min (well under the 15k ops/day default). Token rotation runs nightly — Meta long-lived tokens are refreshed 7 days before expiry (GL#258), Google Ads refresh tokens are validated on every dequeued job. Ad-platform API responses are also mirrored into your sync history in our primary database with a 15-minute TTL for the connection-health UI on /dashboard.

First-party vs third-party data — latency and trust differ

Admaxxer treats the two source classes differently because they have different reliability characteristics:

First-party data

Third-party data

This is why /dashboard/analytics renders pixel revenue and platform-reported revenue side-by-side — you'll see a 5–15% gap most days. The pixel is exact-but-blocked-by-iOS-tracking-prevention; the platform is modeled-but-deduplicated-against-CAPI. Neither is "wrong"; the gap is the iOS attribution artifact.

Cohort and user identity

The most important question in DTC analytics is "is this the same person who clicked the Meta ad two weeks ago?" Admaxxer answers it through three identifiers, in order of trust:

visitor_id — the anonymous primary key

A 24-character random ID generated client-side on the first pageview, stored in a first-party cookie on the merchant's own domain (never admaxxer.com). Default rotation is 24 hours on the client to be GDPR-friendly; merchants on the Pro plan can switch to a 365-day cookie. This survives ITP, but does not survive cookie clearing or device switching.

external_user_id — the cross-device anchor

Set via the admx.identify(user_id) JS API after login, or via the server-side /api/event with an external_user_id in the payload. Once set, the visitor's anonymous events get back-filled with the ID and future events from any device with the same external_user_id (same login) join into the same cohort. This is how mid-funnel signup is stitched: pre-login pageviews (anonymous visitor_id only) are reconciled to post-login activity by a single identify() call at signup completion.

customer_email_hash — the webhook anchor

Revenue webhooks rarely carry the merchant's visitor_id — Stripe doesn't know it, Shopify only knows it if Custom Pixel fired first. To still attribute the payment, the webhook handler hashes the customer email (SHA-256, lowercased, trimmed) and joins it against any identify() event that carried the same email hash. Match rate is typically 70–85% for DTC brands; the unmatched 15–30% land as direct/unattributed revenue (see attribution discrepancies).

The cohort join itself is an analytics-warehouse operation: the revenue stream JOIN the on-site event stream ON (workspace_id, visitor_id) — or (workspace_id, external_user_id) when the cookie has rotated — or (workspace_id, customer_email_hash) as the third-tier fallback. The JOIN runs at read-time; nothing in storage is denormalized.

Where each metric is computed

Every dashboard tile and Claude agent tool call resolves to one or more analytics pipelines. The full list lives in the metric glossary; here's the high-signal subset:

MetricSource streamAnalytics queryDefinition
RevenueRevenue streamthe revenue summary reportSum of amount_cents, deduplicated
VisitsOn-site event streamthe visits summary reportDistinct visitor_id by day
MER (Marketing Efficiency Ratio)boththe revenue rollupRevenue ÷ ad spend, blended across channels
Cohort LTVRevenue streamthe cohort-LTV reportFirst-purchase grouped, 7/30/90/180-day windows
ForecastRevenue streamthe forecast reportOLS + weekly seasonality
MMM (channel contribution)boththe marketing-mix reportOLS + geometric adstock
Incrementalityboththe incrementality reportTwo-proportion z-test on paid-vs-organic cohorts
CAPI Match Rateboththe match-quality report(pixel + CAPI) ÷ pixel events for purchase
Ad-level LTVboththe 7-day ad-LTV report / _30d / _90dPer-ad first-purchase + N-day window
Conversion funnelOn-site event stream + revenue streamthe funnel-steps report2–5 step ordered cohort with drop-off
GoalsOn-site event streamthe goals summary viewCustom + reserved __admx_* goal counts

Report names are illustrative — the canonical list lives in the public glossary at /documentation/metric-glossary. A server-side allowlist is a hard gate: any report not on the allowlist returns 403 even with a valid auth token.

Why a columnar analytics warehouse (vs Snowflake / BigQuery / OLTP-only)

Five source streams, 33+ first-party analytics queries, and 8 read-only Claude tools all read through our columnar analytics warehouse. The choice over alternatives is deliberate:

The full analytics-warehouse auth model (scoped read tokens, the header convention, what happens when a token leaks) is documented at /documentation/architecture/analytics-auth.

The Claude AI agents — read vs write paths

Admaxxer ships two Claude agents, both backed by claude-sonnet-4-6 with prompt caching on the system block + tools array. They're built on the same internal /api/v1/* surface as the UI, so anything a human can see, the agent can see — and anything the agent does, an audit log can replay.

Read-only Analytics Chat (⌘J global drawer)

Eight read-only tools, all PIPE_ALLOWLIST-gated: get_revenue_summary, get_cohort_ltv, get_mer, get_mmm_contribution, get_forecast, get_incrementality, get_capi_match, query_metric. Every tool resolves to one analytics pipeline; nothing writes anywhere. The drawer opens with +J from any page; conversation state lives in chatSessions (per-workspace, prompt-cache-aware).

Destructive-gated Ads Operator

Six tools: four read-only (list_campaigns, get_campaign_insights, get_account_insights, query_metrics) and two destructive-gated (update_campaign, pause_all_low_roas) that require explicit confirmed: true before execution. The agent never auto-confirms; the user must say "yes, pause" (or click the in-chat confirm button) before the Meta or Google API call fires. Every action is mirrored to your sync history for audit. create_campaign was removed in R1 to keep the destructive surface minimal (GL#295).

Privacy, compliance, and the PII strip

Admaxxer's privacy posture is built on three principles:

The full privacy policy is at /privacy; the security posture (SOC 2 Type 1 in progress, AES-256-GCM at rest, TLS 1.3 in transit) is at /security. CSP setup for the pixel is documented at /documentation/troubleshoot/csp.

Comparison — vs Triple Whale, vs Datafast

The same data architecture question, answered three ways:

DimensionTriple WhaleDatafastAdmaxxer
Source classes ~12 (Shopify, Meta, Google, TikTok, Klaviyo, Recharge, Skio, Postscript, Attentive, Northbeam, Hyros, Looker) ~5 (pixel, Stripe, Paddle, LS, Shopify) 5 (pixel, 7 revenue providers, 3 ad APIs, Klaviyo, GSC)
Ingest model Mostly batch ETL (15–60min) Pixel real-time + webhook real-time Pixel real-time + webhook real-time + ad APIs polled
Storage Snowflake (batch-warehouse) Columnar analytics store First-party columnar analytics warehouse
Identity scheme Shopify customer + Klaviyo profile + Meta Match visitor_id + external_user_id visitor_id + external_user_id + email_hash
ML enrichment Willy AI (proprietary), MMM (Robyn-style) None Forecast (OLS), MMM (OLS + adstock), Incrementality (z-test), Claude agents (BYOK)
Query layer Internal API; no public SQL Internal API; pixel SDK PIPE_ALLOWLIST + /api/v1/* + Claude tool calls
Privacy posture Third-party cookies via Shopify; no DNT respect First-party only; respects DNT First-party only; respects DNT; PII strip server-side
Cookieless No (Shopify customer ID required) Yes (visitor_id is anonymous) Yes (visitor_id is anonymous; 24h default rotation)
BYOK AI No (Willy is locked) No (no AI agent) Yes (Anthropic, OpenAI, Google, etc.)

See also

Data ontology — the 10 entities · Revenue data flow · Multi-currency display · our analytics warehouse auth model · BYOK Anthropic · Metric glossary · Conversion funnels · Methodology (Q1 2026)