Documentation · Attribution · Incrementality Testing

Incrementality testing: the only way to measure what your ads actually caused

MTA tells you what happened. Incrementality tells you what YOU caused. Pause Meta in California for two weeks, treat the rest of the country as a synthetic control, and Admaxxer measures the causal lift with a permutation p-value. Free on every plan; Northbeam ships the same surface at enterprise pricing.

MTA vs incrementality How geo-lift works vs Northbeam FAQ

MTA vs incrementality — the operator dispute

Spend an hour scrolling r/marketing, r/PPC, or r/dtcecommerce and you'll see the same argument every week: "Does Meta REALLY drive 4x ROAS, or is most of that cannibalised brand search and would have happened anyway?" Both sides quote MTA numbers. Both sides lose. The honest answer can only come from incrementality — pause the channel, measure the gap.

MTA tells you what happened: which channels appeared in successful paths. It's correlation. Incrementality tells you what you caused: if you removed the channel, would the conversions still happen? It's causation. The two are related but distinct, and the gap is where most DTC operators burn money.

Northbeam ships incrementality at enterprise pricing. TripleWhale operators on Reddit consistently note that TW does NOT ship native geo-lift — workarounds are spreadsheets or TW Sonar's incrementality lite. Admaxxer ships geo-lift + time-holdout free on every plan.

How geo-lift works

You can't randomise an ad campaign across the population — no household-level RCTs in DTC. So we use the next-best thing: synthetic control (Abadie, Diamond, Hainmueller 2010). Pick a treatment region (e.g. California), pick a donor pool (3-8 other states), and fit a weighted average of donor states to match California's pre-period revenue trajectory.

That weighted average IS the synthetic California — what California's revenue WOULD have been without the intervention. During the treatment window, you measure: treatment actual − synthetic California = causal lift. The donor weights are fit ONLY on pre-period data — so the synthetic carries no information about the intervention itself.

We compute a permutation p-value by repeatedly re-labeling: treat each donor AS IF it had been the treatment, fit a synthetic for it from the remaining donors, and compute the placebo lift. The observed lift's rank in the placebo distribution gives the two-sided p-value. If p < 0.05, you can claim direction.

Implementation note: v1 ships an unconstrained OLS solve normalised to sum-to-1 and non-negative-clipped. v1.5 swaps in a small QP solver for the rigorous fit (sum-to-1 + non-negative simultaneously).

When to run a test

Run a test when: you suspect a channel's MTA-claimed ROAS is inflated (Meta is the canonical case); you're considering scaling spend by 2x+ on one channel; you have >=30 days of revenue history per region and >=14 days of test budget; you can pause the channel in a meaningful donor-poolable region; you want to defend a channel's ROAS to your CFO or board.

Skip the test when: you have fewer than 30 days of pre-period revenue; your store is geographically concentrated (no clean donor pool); the channel makes up <5% of your spend; you're in a peak season (BFCM); you can't realistically pause the channel for 14+ days operationally.

Incrementality across DTC tools

ToolShips geo-liftNotes
Admaxxer Yes Synthetic-control geo-lift + time-holdout, permutation p-value, 95% bootstrap CI, treatment vs synthetic time-series in /marketing-acquisition. Free on every plan.
Northbeam Yes Ships incrementality testing as part of Probabilistic 1.0. Enterprise pricing (~$10k+/yr).
TripleWhale No TW operators on Reddit consistently note that TW does NOT ship native geo-lift. Workarounds = spreadsheets or TW Sonar incrementality lite.
Polar Analytics Limited Ships a 'Holdout' module but documentation is thin. Mostly time-based holdouts; geo-lift requires manual setup.
Hyros No No native incrementality. Hyros's pitch is server-side identity stitching for first-click measurement.
Datafast No UTM-only attribution; no incrementality surface.

Run your first holdout — 4 steps

  1. Step 1. Pick a hypothesis you actually want to test. Pick ONE channel, ONE region (or one workspace-wide pause window), and ONE outcome metric. Phrase it as a falsifiable claim, e.g. 'Meta drives 40% of last-click revenue, but if I paused Meta in California, would California revenue actually drop 40%?'
  2. Step 2. Open /settings/incrementality and configure the holdout. For geo-lift: pick treatment regions (e.g. ['US-CA']), donor pool (3-8 control regions), start + end dates (>=14 days, ideally 28+), and the channel to pause. For time-holdout: skip regions, pick a workspace-wide pause window. Save as draft.
  3. Step 3. Pause the channel in the treatment region for the test window. Manually go to Meta Ads Manager (or Google Ads / TikTok Ads), set the channel's geo targeting to exclude your treatment region, and let it run for the full window. Don't peek at intermediate results.
  4. Step 4. Read the result on /marketing-acquisition's IncrementalityCard. Once status flips to 'completed', the IncrementalityCard shows: lift % (color-coded green/rose by significance), permutation p-value chip, 95% CI bar, treatment vs synthetic-control time-series. If p < 0.05 AND lift > 0, the channel is causally driving revenue.

Curl example

Once your geo-lift completes, the full result envelope (lift %, p-value, 95% CI, donor weights, treatment vs synthetic revenue) is available at /api/v1/incrementality/holdouts/:id/result. No SDK required — copy, paste, swap $TOKEN and $TEST_ID.

# Read the result envelope for a completed geo-lift test.
# Replace $TOKEN with a workspace API key from /settings/api.
# Replace $TEST_ID with the test ID from /settings/incrementality.
curl -H "Authorization: Bearer $TOKEN" \
  "https://admaxxer.com/api/v1/incrementality/holdouts/$TEST_ID/result" \
  | jq

# Sample response (truncated):
# {
#   "status": "completed",
#   "result": {
#     "lift_pct": 0.184,
#     "lift_absolute": 24580,
#     "p_value": 0.038,
#     "ci_95": { "lower": 0.041, "upper": 0.327 },
#     "treatment_total_revenue": 158420,
#     "synthetic_total_revenue": 133840,
#     "donor_pool_used": ["US-TX","US-NY","US-FL","US-IL"],
#     "donor_weights": { "US-TX": 0.41, "US-NY": 0.27, "US-FL": 0.18, "US-IL": 0.14 },
#     "treatment_window": { "start": "2026-04-01", "end": "2026-04-28" },
#     "pre_period":       { "start": "2026-02-01", "end": "2026-03-31" }
#   }
# }
#
# If the test is still running, status='running' and result=null. If the
# pre-period is too short, status='completed' with reason='insufficient_pre_period_data'.

FAQ

What's the difference between MTA and incrementality?
MTA (last-click, first-click, time-decay, Markov) tells you what HAPPENED: which channels appeared in successful paths, in what proportion. It can't tell you what you CAUSED. Incrementality is the only way to measure causal contribution: pause the channel, measure the gap.
What's the synthetic-control method?
Abadie, Diamond, and Hainmueller (2010) introduced the synthetic-control method for causal inference when you can't randomise. Pick a treatment unit (e.g. California), pick a donor pool (3-8 other states), fit a weighted average of donor states to match California's pre-period revenue trajectory. That weighted average IS the synthetic California — what California's revenue WOULD have been without the intervention.
How long does a geo-lift test need to run?
Minimum 14 days; recommended 28+ days for stronger statistical power. Pre-period (used for fitting the synthetic) should be at least 30 days; the runner uses the 60 days before the test start by default.
How does the p-value work?
Two-sided permutation test against the donor placebo distribution. For each donor region, treat it AS IF it had been the treatment region. Compute the synthetic-control fit and the resulting placebo lift. The observed lift's rank in the placebo distribution gives the two-sided p-value. p < 0.05 = the observed lift is unlikely under the null hypothesis of 'no causal effect'.
What if my treatment region doesn't have enough revenue history?
The runner returns a result envelope with result: null and reason: 'insufficient_pre_period_data' (or similar). The pre-period needs >=30 days of complete revenue data; if your store is younger than that or has gaps, the synthetic-control fit fails.
Can I run an incrementality test without Shopify connected?
The default revenue source is Shopify. If Shopify isn't connected but the pixel is, the runner falls back to pixel-derived revenue (visitor_payments aggregated by IP-geo). Pixel-only is noisier but still source-additive.
What about multi-region geo-lift (treat several regions at once)?
Supported. Pass multiple treatment regions in treatmentRegions. The runner aggregates their revenue, fits a single synthetic from the donor pool, and computes a single lift estimate.
Can the Claude AI agent run an incrementality test?
The agent has read access via query_metrics to the holdout-result envelope. It can summarize a completed test but can't create new holdouts or change channel pause status — that's an operator decision with real ad-spend consequences.
Geo-lift vs holdout — what's the difference?
Both are causal incrementality tests, but they manipulate different dimensions. Geo-lift treats different geographic regions: pause Meta in California (treatment), leave it on in Texas/New York/Florida (donor pool), and use the synthetic-control method to estimate California's counterfactual revenue from a weighted blend of donors. Time-holdout pauses the channel everywhere for a defined window: pause Meta workspace-wide for 14 days, compare actual revenue during the pause window against the synthetic baseline derived from the pre-pause period. Geo-lift is FASTER (you can run it without sacrificing total revenue — only the treatment region pays the pause cost) but requires multiple distinguishable regions with enough revenue history for a clean donor pool. Time-holdout is operationally simpler (no Meta Ads Manager geo targeting) but costs more revenue (the entire workspace pauses) and is more vulnerable to time-period confounders (BFCM week vs. a normal week). Admaxxer ships both surfaces; pick geo-lift first if you have multiple revenue-poolable regions, time-holdout if you're in a single market.