How Data Works in Admaxxer — End-to-end DTC Data Flow

A walkthrough of the full Admaxxer data stack — how a Meta ad click on Tuesday becomes a cohort-LTV chart by Friday, with every transformation step in between.

TL;DR: Admaxxer ingests five source classes (first-party pixel, revenue webhooks, ad-platform APIs, Klaviyo, Google Search Console) through three ingest layers (browser POSTs, Express webhook handlers, managed job-queue poll workers). Hot, append-only event data lives in our first-party analytics warehouse (columnar, ~50ms p95 reads). Workspace, user, and billing state lives in our primary database. Every dashboard card, API response, and Claude agent answer reads through a curated allowlist — nothing queries the warehouse without going through a vetted query. The two Claude agents share the same internal /api/v1/* surface as the UI: anything a human can see, the agent can see; anything destructive (pause, scale, launch) requires explicit confirmed: true from the user.

The data flow at a glance

Five source classes, three ingest layers, two storage tiers, one query layer, three presentation surfaces:

SOURCES                       INGEST                          STORAGE              QUERY          PRESENTATION
-------                       ------                          -------              -----          ------------
1. First-party pixel    -->   POST /api/event                 analytics warehouse  allowlisted    /dashboard/*
   (visit, goal, paymt)       (browser, server SDK)           on-site events       queries        /api/v1/*
                                                                                    (33+ queries)  Claude agents
2. Custom Pixel +       -->   Shopify checkout webhook   -->  analytics warehouse
   Shopify orders             POST /api/event (server)        revenue stream

3. Revenue webhooks     -->   Express webhook handlers   -->  analytics warehouse
   (Stripe, Paddle,           server/webhookHandlers.ts       revenue stream
    LS, Polar, Dodo,
    Woo, Shopify)

4. Ad-platform APIs     -->   managed job-queue workers  -->  analytics warehouse
   (Meta, Google,             server/queues/                  ad-insights stream
    TikTok)                   your sync history (15min cache)

5. Klaviyo + GSC        -->   job-queue poll + OAuth flow -->  analytics warehouse
                                                              email + search streams

Workspace + billing     -->   our typed query layer      -->  primary database     (direct ORM    same surfaces
state                                                         (workspace tables)   reads)

Two storage tiers, two access patterns. The analytics warehouse answers analytical questions ("revenue last 30 days by ad set"); the primary database answers transactional questions ("does this user have an active subscription?"). The split is deliberate — running cohort joins against an OLTP database at DTC scale doesn't perform; running session lookups against a columnar store doesn't make sense either.

The three ingest layers in detail

1. The first-party pixel — `POST /api/event`

The Admaxxer pixel (https://admaxxer.com/js/script.js or the script.plus.js variant) is a ~6KB asynchronous script. On every pageview it fires a single POST /api/event with the page URL, referrer, sticky visitor_id cookie, sticky first-touch UTMs, viewport size, and language. track('goal_name', metadata) fires a second event for any merchant-defined goal. The Custom Pixel build for Shopify catches checkout_completed in the Shopify Web Pixel sandbox and fires a payment event with the order subtotal, currency, and Shopify GID.

The receiving Express handler validates the payload, hashes incoming IP for geo-IP only (never stored raw), strips PII patterns (email, phone, credit-card-shape numbers) from goal metadata, and writes an append-only row to the on-site event stream (for visits/goals) or the revenue stream (for revenue). Deduplication keying on (workspace_id, website_id, provider, external_payment_id) guarantees idempotency — a Shopify order that arrives via Custom Pixel and via the orders/create webhook lands once, not twice.

2. Revenue webhooks — provider-signed payment events

Seven revenue providers ship signed webhooks to Admaxxer (Stripe, Paddle, Lemon Squeezy, Polar, Dodo, WooCommerce, Shopify). Each provider has its own handler in server/webhookHandlers.ts — signature verification first (HMAC-SHA256 with the provider's secret), then payload normalization to the canonical revenue-stream shape: amount_cents, currency, provider, external_payment_id, customer_email_hash, occurred_at. The same deduplication key applies; reorder-tolerant by design.

Webhooks beat the pixel for trust (signed by the provider, not the browser) and beat the daily polling fallback for latency (sub-second vs. T+24h). When all three sources are configured, the deduplication priority is webhook > pixel > poll. See /documentation/data/revenue-data-flow for the full priority table and the dedup keys per provider.

3. Ad-platform APIs — our job queue-orchestrated polling

Meta Marketing API, Google Ads API, and TikTok Marketing API are pulled, not pushed. server/queues/ defines seven background workers backed by managed job queue: meta-insights, google-insights, tiktok-insights, token-rotation, klaviyo-sync, gsc-sync, session-cleanup. Each worker dequeues a per-account job, hits the platform API, transforms the response into the canonical ad-insights shape (one row per (date, account_id, campaign_id, adset_id, ad_id)), and writes to our analytics warehouse.

Polling cadence is conservative: Meta insights pull every 30 min (well under the documented 200 calls/hour/user-token rate limit), Google Ads insights pull every 60 min (well under the 15k ops/day default). Token rotation runs nightly — Meta long-lived tokens are refreshed 7 days before expiry (GL#258), Google Ads refresh tokens are validated on every dequeued job. Ad-platform API responses are also mirrored into your sync history in our primary database with a 15-minute TTL for the connection-health UI on /dashboard.

First-party vs third-party data — latency and trust differ

Admaxxer treats the two source classes differently because they have different reliability characteristics:

First-party data

Sources: Pixel events (visits, goals, payments via Custom Pixel), revenue webhooks (Stripe, Shopify, Paddle, etc.).
Latency: Real-time. Pixel events land in our analytics warehouse within ~500ms; webhook events within ~1s after the provider signs and dispatches.
Trust: Exact. The merchant owns the source; no sampling, no modeling, no third-party rate limit.
Volume: Highest. A typical DTC brand fires ~100k pixel events / month per $10k of monthly revenue.

Third-party data

Sources: Meta Marketing API, Google Ads API, TikTok Marketing API, Klaviyo, Google Search Console.
Latency: 15–60 minutes lag. Polling cadence is the floor; rate limits push it higher under heavy concurrent workspace load.
Trust: Modeled or sampled. Meta CAPI deduplicates against pixel server-side. Google Search Console is sampled; aggregated metrics are accurate but per-query / per-page rows can drop sub-threshold values.
Volume: Lower. Aggregated by date+entity from the source; one row per (day, ad_id) is typical.

This is why /dashboard/analytics renders pixel revenue and platform-reported revenue side-by-side — you'll see a 5–15% gap most days. The pixel is exact-but-blocked-by-iOS-tracking-prevention; the platform is modeled-but-deduplicated-against-CAPI. Neither is "wrong"; the gap is the iOS attribution artifact.

Cohort and user identity

The most important question in DTC analytics is "is this the same person who clicked the Meta ad two weeks ago?" Admaxxer answers it through three identifiers, in order of trust:

visitor_id — the anonymous primary key

A 24-character random ID generated client-side on the first pageview, stored in a first-party cookie on the merchant's own domain (never admaxxer.com). Default rotation is 24 hours on the client to be GDPR-friendly; merchants on the Pro plan can switch to a 365-day cookie. This survives ITP, but does not survive cookie clearing or device switching.

external_user_id — the cross-device anchor

Set via the admx.identify(user_id) JS API after login, or via the server-side /api/event with an external_user_id in the payload. Once set, the visitor's anonymous events get back-filled with the ID and future events from any device with the same external_user_id (same login) join into the same cohort. This is how mid-funnel signup is stitched: pre-login pageviews (anonymous visitor_id only) are reconciled to post-login activity by a single identify() call at signup completion.

customer_email_hash — the webhook anchor

Revenue webhooks rarely carry the merchant's visitor_id — Stripe doesn't know it, Shopify only knows it if Custom Pixel fired first. To still attribute the payment, the webhook handler hashes the customer email (SHA-256, lowercased, trimmed) and joins it against any identify() event that carried the same email hash. Match rate is typically 70–85% for DTC brands; the unmatched 15–30% land as direct/unattributed revenue (see attribution discrepancies).

The cohort join itself is an analytics-warehouse operation: the revenue stream JOIN the on-site event stream ON (workspace_id, visitor_id) — or (workspace_id, external_user_id) when the cookie has rotated — or (workspace_id, customer_email_hash) as the third-tier fallback. The JOIN runs at read-time; nothing in storage is denormalized.

Where each metric is computed

Every dashboard tile and Claude agent tool call resolves to one or more analytics pipelines. The full list lives in the metric glossary; here's the high-signal subset:

Metric	Source stream	Analytics query	Definition
Revenue	Revenue stream	`the revenue summary report`	Sum of `amount_cents`, deduplicated
Visits	On-site event stream	`the visits summary report`	Distinct `visitor_id` by day
MER (Marketing Efficiency Ratio)	both	`the revenue rollup`	Revenue ÷ ad spend, blended across channels
Cohort LTV	Revenue stream	`the cohort-LTV report`	First-purchase grouped, 7/30/90/180-day windows
Forecast	Revenue stream	`the forecast report`	OLS + weekly seasonality
MMM (channel contribution)	both	`the marketing-mix report`	OLS + geometric adstock
Incrementality	both	`the incrementality report`	Two-proportion z-test on paid-vs-organic cohorts
CAPI Match Rate	both	`the match-quality report`	(pixel + CAPI) ÷ pixel events for purchase
Ad-level LTV	both	`the 7-day ad-LTV report` / `_30d` / `_90d`	Per-ad first-purchase + N-day window
Conversion funnel	On-site event stream + revenue stream	`the funnel-steps report`	2–5 step ordered cohort with drop-off
Goals	On-site event stream	`the goals summary view`	Custom + reserved `__admx_*` goal counts

Report names are illustrative — the canonical list lives in the public glossary at /documentation/metric-glossary. A server-side allowlist is a hard gate: any report not on the allowlist returns 403 even with a valid auth token.

Why a columnar analytics warehouse (vs Snowflake / BigQuery / OLTP-only)

Five source streams, 33+ first-party analytics queries, and 8 read-only Claude tools all read through our columnar analytics warehouse. The choice over alternatives is deliberate:

Realtime, low-latency reads: a columnar engine with incrementally-rebuilt materialized rollups serves the ~50ms p95 reads a dashboard tile needs, behind a simple HTTP query layer with an auth model scoped to a single query + filter — no heavyweight driver in the merchant's request path.
vs Snowflake: Snowflake's pricing is per-second compute — great for batch, expensive for the ~50ms p95 reads a dashboard tile needs. Snowflake also doesn't ship realtime materialized views; you'd build them in dbt + Airflow.
vs BigQuery: Similar story. BigQuery's strength is petabyte-scale queries; DTC analytics doesn't need that. BQ's per-query latency floor is ~2–5s; our warehouse's is ~50ms.
vs the primary database alone: the primary database can answer a 90-day cohort LTV query — for a single workspace, on a small data volume. At DTC scale (10M+ events / workspace), the columnar engine is the difference between a 200ms response and a 30s scan.

The full analytics-warehouse auth model (scoped read tokens, the header convention, what happens when a token leaks) is documented at /documentation/architecture/analytics-auth.

The Claude AI agents — read vs write paths

Admaxxer ships two Claude agents, both backed by claude-sonnet-4-6 with prompt caching on the system block + tools array. They're built on the same internal /api/v1/* surface as the UI, so anything a human can see, the agent can see — and anything the agent does, an audit log can replay.

Read-only Analytics Chat (⌘J global drawer)

Eight read-only tools, all PIPE_ALLOWLIST-gated: get_revenue_summary, get_cohort_ltv, get_mer, get_mmm_contribution, get_forecast, get_incrementality, get_capi_match, query_metric. Every tool resolves to one analytics pipeline; nothing writes anywhere. The drawer opens with ⌘+J from any page; conversation state lives in chatSessions (per-workspace, prompt-cache-aware).

Destructive-gated Ads Operator

Six tools: four read-only (list_campaigns, get_campaign_insights, get_account_insights, query_metrics) and two destructive-gated (update_campaign, pause_all_low_roas) that require explicit confirmed: true before execution. The agent never auto-confirms; the user must say "yes, pause" (or click the in-chat confirm button) before the Meta or Google API call fires. Every action is mirrored to your sync history for audit. create_campaign was removed in R1 to keep the destructive surface minimal (GL#295).

Privacy, compliance, and the PII strip

Admaxxer's privacy posture is built on three principles:

First-party only. No third-party cookies. No cross-domain tracking. No Facebook Pixel embedded in our pixel. Every visitor cookie is set on the merchant's own domain, scoped to that domain.
Consent gate. The pixel respects data-admx-respect-dnt on the script tag. When set, the pixel reads navigator.doNotTrack and the IAB TCF cookie; if the visitor hasn't consented, no event fires — not even an anonymous one.
Server-side PII strip. Goal metadata is regex-scanned at ingest for email-shape, phone-shape, and credit-card-shape strings. Matched values are replaced with [REDACTED] before the row hits our analytics warehouse. The strip is server-side because client-side strips can be bypassed.

The full privacy policy is at /privacy; the security posture (SOC 2 Type 1 in progress, AES-256-GCM at rest, TLS 1.3 in transit) is at /security. CSP setup for the pixel is documented at /documentation/troubleshoot/csp.

Comparison — vs Triple Whale, vs Datafast

The same data architecture question, answered three ways:

Dimension	Triple Whale	Datafast	Admaxxer
Source classes	~12 (Shopify, Meta, Google, TikTok, Klaviyo, Recharge, Skio, Postscript, Attentive, Northbeam, Hyros, Looker)	~5 (pixel, Stripe, Paddle, LS, Shopify)	5 (pixel, 7 revenue providers, 3 ad APIs, Klaviyo, GSC)
Ingest model	Mostly batch ETL (15–60min)	Pixel real-time + webhook real-time	Pixel real-time + webhook real-time + ad APIs polled
Storage	Snowflake (batch-warehouse)	Columnar analytics store	First-party columnar analytics warehouse
Identity scheme	Shopify customer + Klaviyo profile + Meta Match	visitor_id + external_user_id	visitor_id + external_user_id + email_hash
ML enrichment	Willy AI (proprietary), MMM (Robyn-style)	None	Forecast (OLS), MMM (OLS + adstock), Incrementality (z-test), Claude agents (BYOK)
Query layer	Internal API; no public SQL	Internal API; pixel SDK	PIPE_ALLOWLIST + `/api/v1/*` + Claude tool calls
Privacy posture	Third-party cookies via Shopify; no DNT respect	First-party only; respects DNT	First-party only; respects DNT; PII strip server-side
Cookieless	No (Shopify customer ID required)	Yes (visitor_id is anonymous)	Yes (visitor_id is anonymous; 24h default rotation)
BYOK AI	No (Willy is locked)	No (no AI agent)	Yes (Anthropic, OpenAI, Google, etc.)