SEO + AI Crawlability — How Admaxxer is built for Google + ChatGPT + Claude + Perplexity
Admaxxer ships with first-class support for two crawler classes that matter for DTC SaaS in 2026: Google’s classic web crawler (Googlebot) and the new generation of AI assistants (ChatGPT search, Claude.ai, Perplexity, DeepSeek, You.com, and the broader OAI-SearchBot / GPTBot / ClaudeBot / PerplexityBot fleet). Every public page renders full crawlable HTML, surfaces structured data via JSON-LD, exposes itself in sitemap.xml, declares allowance in robots.txt, and lists itself in llms.txt — from day one. This page documents how, so customers can verify it and self-hosters can replicate it.
TL;DR
Most SaaS marketing sites in 2026 are still single-page React apps that ship an empty <div id="root"> to crawlers and hydrate client-side. That worked when Google was the only mouth to feed; it does not work when ChatGPT, Claude, and Perplexity want to cite your product in answers to user questions. Admaxxer’s public surface (landing, /pricing, /compare/*, /features/*, /blog/*, /documentation/*, /integrations/*) is built so that:
- Googlebot sees fully-rendered HTML with breadcrumbs, internal links, structured data, and hreflang tags — rendered server-side via the
SSR_MODE='crawlers-only'middleware in server/ssr/middleware.ts. No JavaScript execution required. - AI assistants (ChatGPT search, Claude.ai web fetch, Perplexity, DeepSeek, etc.) see the same fully-rendered HTML or the pre-rendered DOM block in
client/index.htmlwhen they fetch with a browser UA — whichever path their crawler uses, the content is there. - Both are guided by
sitemap.xml(priority + hreflang per page),robots.txt(explicit Allow lines for every public route + every AI bot), andllms.txt(a curated index of canonical pages keyed by topic).
That triple-cover ensures Admaxxer pages get cited correctly in AI answers to questions like “What’s the cheapest Triple Whale alternative?”, “Which DTC analytics tool ships a UTM coverage tile?”, or “How does Admaxxer compare to Northbeam?” — without us paying for a single backlink. The rest of this page documents exactly how each layer works, what we do for Google specifically, what we do for AI assistants specifically, the playbook self-hosters and forks should follow, and an FAQ for end users.
The two crawlers that matter for DTC SaaS
Discoverability for a SaaS marketing site in 2026 splits cleanly into two distinct audiences with different mechanics. Tools that optimize for one and ignore the other end up either invisible in search OR uncited by AI — both equally bad for top-of-funnel.
1. Googlebot — the classic ranking engine
Googlebot crawls the open web on its own schedule, indexes pages it finds, and ranks them in response to user search queries. Its priorities are well-documented: page speed (Core Web Vitals), unique content, internal linking, structured data, mobile-first rendering, and crawl-budget allocation guided by the homepage’s outbound links + sitemap.xml. The single biggest crawl-budget signal Google uses for a low-authority new domain is the internal-link graph anchored at the homepage — not the sitemap. Sitemap-only discovery is the weakest signal you can give Google for a new domain; it tells Google the page exists, not that it’s worth crawling. Google’s “Discovered, currently not indexed” bucket in Search Console is full of sitemap-listed pages that have zero or weak internal links. Admaxxer’s SSR footer ships ~170 internal links from every public page so the entire site is one or two clicks deep from the homepage — that’s the crawl-budget allocation in action. (See developer docs for the full SSR pipeline.)
2. AI assistants — the citation engine
The new class of crawlers is AI-driven: GPTBot (OpenAI’s training crawler), ChatGPT-User (ChatGPT search’s on-demand fetch), OAI-SearchBot (OpenAI’s search index), ClaudeBot + Claude-Web + anthropic-ai (Anthropic’s training and search fetches), PerplexityBot (Perplexity’s answer engine), DeepSeek (DeepSeek’s training crawler), Google-Extended (Bard / Gemini training data), Applebot-Extended (Apple Intelligence training data), Meta-ExternalAgent (Meta AI training), Bytespider (TikTok / Doubao). When a user asks an AI assistant “What’s the cheapest Triple Whale alternative?”, that AI engine surfaces an answer based on what it crawled or what its real-time fetcher pulls. Two crawlable signals matter most: (a) is your page in the training corpus (long-term), and (b) does the on-demand fetcher find a useful answer when it visits today (short-term). Admaxxer optimizes for both.
Why both matter — ranking AND citation are different distribution paths
Google ranks pages and routes the click to the publisher’s site (where the merchant signs up). AI assistants quote content inline and may or may not surface a link — but they shape the user’s mental model of the category before the user ever searches. A DTC operator who reads a Perplexity answer that names “Admaxxer, Triple Whale, Northbeam, and Hyros” in response to “best DTC analytics tool” is more likely to type admaxxer.com directly into a browser than someone who only sees us in a Google SERP for the same query. Cite-ability is a top-of-funnel weapon for any product whose name is harder to remember than the category — and Admaxxer (a category-creator brand) sits squarely in that bucket. So we treat AI discoverability as equally important to traditional SEO — not a separate channel, but a parallel one.
What we do for Google
Four layers, each fixing a specific failure mode of single-page-app SEO. The first three are rendering-layer; the fourth is discovery-layer.
Server-side rendering for crawler UAs (SSR_MODE='crawlers-only')
The Express middleware at server/ssr/middleware.ts intercepts incoming requests and, when the user-agent matches a known crawler (Googlebot, Bingbot, AhrefsBot, Twitterbot, Slackbot, Discordbot, plus every AI bot in the next section), renders the page server-side via a dedicated template (server/ssr/templates/*.ts) and sends fully-formed HTML back. The same URL served to a real user’s browser returns the React SPA, which hydrates from the same data sources but with motion, interactivity, and client-side state. The mode flag is intentionally locked to 'crawlers-only' — setting it to 'all' would break the human UI by overlaying SSR styles on top of React hydration (recorded in the SSR_MODE rule as a never-change rule). Crawlers see the full content; humans see the polished React app; nobody sees an empty shell.
Rich internal linking via the SSR footer (~170 links from every page)
Crawl-budget allocation for a new domain is driven primarily by internal links anchored at the homepage. The SSR footer rendered on every public page (in server/ssr/templates/base.ts) ships ~170 deep links across seven sections: integrations (Meta, Google, Shopify, TikTok, Klaviyo, Amazon, Pinterest), comparisons (Triple Whale, Northbeam, Hyros, Polar, Datafast and the long-tail), features (every /features/:slug), guides, glossary, use cases, and documentation deep-dives. The React-side LandingFooter mirrors the same link list so the human-visible footer never drifts from the SSR footer (the GL#110 mirror coherence rule). Result: Googlebot lands on admaxxer.com, finds 170 outbound internal links in the rendered HTML, and follows them in priority order — turning sitemap-listed pages into “Crawled, currently indexed” instead of “Discovered, currently not indexed”.
Differentiated sitemap priority by section
The dynamic sitemap at server/sitemap.ts assigns priority weights per content type so Google knows which pages we consider canonical: homepage 1.0, /pricing 0.9, /integrations/* 0.85, /compare/* 0.8, /documentation/* 0.75, /features/* 0.75, /blog/glossary/* 0.7, /faq/* 0.7, /blog/posts/* 0.6, plus changefreq hints (weekly for landing/pricing/comparisons, monthly for static documentation, weekly for the blog index). Each entry also carries <xhtml:link rel="alternate" hreflang="..."/> tags listing all locale variants whenever they’re published. Priority is advisory — Google ultimately ranks based on the broader signal mix — but coherent priorities prevent “every page is priority 0.5” flatness that wastes crawl budget on low-value URLs.
Deferred i18n locale variants until traffic justifies expansion
Multi-locale sitemaps are sometimes a foot-gun: if you publish es, fr, de, pt-BR, ja, zh-CN, ko, it, nl, ar variants of every page but the localized content is just an auto-translation, you balloon the sitemap from ~180 URLs to ~1,800 URLs and dilute crawl budget across pages with no organic interest. Admaxxer gates the locale-variant inclusion behind a SITEMAP_INCLUDE_LOCALES env flag (default false) — the sitemap ships only the canonical English URLs until per-locale traffic in Search Console crosses a threshold worth investing in. When we’re ready to expand into a locale, flipping the flag adds the variants in one deploy. (See deploy notes for the runtime expectations.)
What we do for AI assistants
Four layers again, parallel to the Google-side ones but tuned to how AI tools fetch + parse + cite content. The first is robots.txt; the second and third are content-shape; the fourth is JSON-LD schema.
Allowlist for GPTBot, ClaudeBot, PerplexityBot, DeepSeek, and every AI fleet
The robots.txt response generated by server/robots.ts ships explicit User-agent + Allow entries for every AI bot we know about: GPTBot, ChatGPT-User, OAI-SearchBot (the three OpenAI fleets — training, ChatGPT search on-demand, ChatGPT search index), ClaudeBot, Claude-Web, anthropic-ai (Anthropic’s three crawlers), PerplexityBot, Google-Extended (Bard / Gemini training; this is separate from Googlebot — you have to opt-in to it explicitly), Applebot-Extended (Apple Intelligence), Meta-ExternalAgent (Meta AI), Bytespider (TikTok / Doubao). Some of these are training-only (their disallowance has no real-time penalty — just affects whether your content is in the next training pass); others (ChatGPT-User, PerplexityBot) drive real-time citations. Admaxxer allows all of them so we’re cite-able everywhere; if you fork the codebase and want to opt out of any specific one, edit server/robots.ts — the file lists each agent on its own line so the diff is one line per opt-out.
Structured llms.txt with topic-keyed canonical links
The llms.txt at the root of admaxxer.com is an emerging standard (championed by Anthropic and a coalition of AI-friendly publishers) for telling AI tools which pages on a site are canonical for which topics. The format is markdown sections with topic headers and bullet links: e.g. “## Comparisons” lists every /compare/* page; “## Documentation” lists every /documentation/* deep-dive; “## Glossary” lists every /blog/glossary/* term. The advantage over sitemap.xml: an AI tool that lands on a question like “What does Admaxxer say about CAPI match rate?” can grep llms.txt for the topic and surface the canonical page directly, instead of crawling the whole sitemap to figure out which page is most relevant. Admaxxer maintains llms.txt at server/llms.ts and updates it every time a new public page is added (per the AI Crawlability Rule).
Full content rendered for crawlers (not empty SPA shell)
The SSR_MODE='crawlers-only' middleware covers the case where the AI bot identifies itself with a known UA (e.g., GPTBot, ClaudeBot, PerplexityBot). But some AI tools fetch with a browser UA — ChatGPT’s “summarize this page”, Claude.ai’s URL fetch, Perplexity’s in-flight fetcher — and would otherwise hit the React SPA shell and see only an empty <div id="root">. To handle this, client/index.html contains a pre-rendered content block inside <div id="root"> with display: none. AI tools parse the DOM source regardless of CSS visibility, so they see the content; humans render the React app on top of <div id="root"> which immediately displaces the pre-rendered block before any visual flash. The block stays hidden via CSS so there’s no flash-of-unstyled-content for human users; the display: none rule is intentionally not removed (it would cause FOUC). This is a load-bearing pattern documented in CLAUDE.md as the SSR_MODE Rule.
JSON-LD schema (TechArticle, SoftwareApplication, FAQPage, BreadcrumbList) on every public page
Every SSR template ships a JSON-LD <script type="application/ld+json"> block in <head> with the schema types relevant to the page: TechArticle for documentation pages and guides; SoftwareApplication for landing, pricing, and integration pages (so AI tools surface the offer + price + screenshots correctly); FAQPage on every page that has a Q&A section (so AI tools can quote the answer directly); BreadcrumbList on every nested page so the navigation hierarchy is machine-readable; Organization + WebSite on the homepage; BlogPosting on /blog/posts/*; Product + Offer on /pricing. AI tools use these to extract structured facts (price, version, FAQ entries, breadcrumb path) without parsing free-form HTML — meaning the answer they cite is more likely to quote the exact price, the exact feature, the exact FAQ answer rather than paraphrase.
How a self-hoster maintains this
For developers running their own Admaxxer fork (or building from the same playbook), maintaining AI + Google discoverability after every public-page addition follows a checklist we call the AI Crawlability Rule. Eight wiring touchpoints, executable in 5–15 minutes per new page.
The 6-place + 2-footer wiring rule
client/src/App.tsx— add a<Route>entry for the new page so the React SPA matches and renders the human-visible component.server/ssr/route-matcher.ts— add aRoutePatternentry so the SSR middleware recognizes the path and dispatches to the right template.server/ssr/templates/static-page.tsdispatcher — map theslugidentifier to the template function (or write a new dedicated template).server/ssr/templates/[your-template].ts— render the full content with TechArticle / FAQPage / BreadcrumbList JSON-LD, semantic HTML, breadcrumbs, and 10+ outbound internal links. Use this template as a reference.server/sitemap.ts— add the URL to the sitemap with a priority + changefreq + hreflang entries (if localized).server/robots.ts— add an Allow line for the path. The default is “allow everything except /api/* and /admin/*”, but explicit Allow lines reinforce the signal.server/llms.ts— add a bullet under the relevant topic section so AI tools index the page in the right cluster.server/ssr/templates/base.tsSSR footer +client/src/components/landing/LandingFooter.tsxReact footer — add the new page to BOTH so the human-visible footer mirrors the SSR footer (GL#110 coherence rule). The SSR footer is what Googlebot reads; the React footer is what humans click; they MUST not drift.
Why all eight
The first six are sometimes called the “6-place rule” in CLAUDE.md. The last two (the dual-footer mirror) are the lesson learned from GL#283: a sitemap entry without a corresponding internal-link source is an orphan page in Google’s eyes, and Search Console will mark it “Discovered, currently not indexed”. The SSR footer is the cheapest way to fix this for every page at once: rather than thinking about which other pages should link to your new page, just add it to the universal footer that ships on every other page. Cost: one line in base.ts + one line in LandingFooter.tsx. Benefit: 170 inbound internal links from every other page on the site.
Verification
After wiring, four quick checks confirm the page is discoverable:
curl -A "GPTBot/1.0" https://admaxxer.com/your-new-page— should return full HTML with JSON-LD, not the React shell.curl https://admaxxer.com/sitemap.xml | grep your-new-page— should return the URL with priority + changefreq.curl https://admaxxer.com/robots.txt— should not show a disallow for the path.curl https://admaxxer.com/llms.txt | grep your-new-page— should return the bullet under the topic section.
Plus the human-side check: open admaxxer.com in a real browser, scroll to the footer, confirm the new page appears in the React footer (and click it — should navigate). If all five checks pass, you’re done.
Related Admaxxer documentation
Pages on admaxxer.com that build on or complement this one. Internal linking is the crawl-budget mechanism — we link generously between related topics so Googlebot and AI tools both reach every page in one or two hops.
- Developer documentation — full REST API + architecture + observability + the SSR pipeline this page describes.
- Data capture coverage — what Admaxxer’s pixel collects vs Triple Whale, Northbeam, Hyros, Polar, Datafast.
- UTM tracking best practices — the in-app URL Builder + ax_* ID namespace that survives campaign renames.
- Connect any AI agent (MCP) — setup guides for Claude Desktop, Claude Code, ChatGPT, Cursor, Windsurf, OpenClaw, Cline, Zed.
- AI provider BYOK — bring your own Anthropic / OpenAI / Google / DeepSeek / xAI / Mistral key.
- Admin documentation — the workspace-driven Edit User dialog and audit log.
- Install on Shopify (App Store) — one-click install, Web Pixel, GDPR webhooks.
- Install on WordPress / WooCommerce — GPL v2+ plugin v1.3.0.
- Analytics overview — how the pixel + UTMs + click IDs combine to drive revenue attribution.
- AI agent documentation — the Claude-powered campaign operator.
- Triple Whale alternative — head-to-head comparison.
- Northbeam alternative — positioning and price comparison.
- Hyros alternative — click-ID coverage + price comparison.
- Glossary — server-side rendering — what SSR is and why crawler-only mode matters.
- Glossary — structured data (JSON-LD) — how schema.org types make content machine-readable.
- Glossary — llms.txt — the emerging standard for AI-discoverability indexes.
- Glossary — orphan page — why “Discovered, currently not indexed” happens and how to fix it.
- Glossary — crawl budget — how Google allocates crawler attention for new domains.
- All integrations — every supported ad platform + commerce platform + AI provider.
- Changelog — what shipped recently, with BlogPosting JSON-LD per entry.
FAQ
- Is Admaxxer indexed by ChatGPT search?
- Yes.
robots.txtatadmaxxer.com/robots.txtships explicit Allow lines for OpenAI’s three crawler fleets —GPTBot(training),ChatGPT-User(on-demand fetch when a ChatGPT user asks about a URL), andOAI-SearchBot(the search index that powers ChatGPT search). Every public Admaxxer page renders fully-formed HTML with JSON-LD structured data that ChatGPT can extract directly. Verify withcurl -A "GPTBot/1.0" https://admaxxer.com— you’ll see the full landing page HTML, not an empty SPA shell. - Is Admaxxer indexed by Claude?
- Yes.
robots.txtships Allow lines for Anthropic’s three crawlers —ClaudeBot(training),Claude-Web(Claude.ai’s URL fetch when a user pastes a link), andanthropic-ai(the broader Anthropic crawl). Same SSR pipeline + same JSON-LD as the ChatGPT side — the rendered output is identical regardless of which AI bot fetches it. Verify withcurl -A "ClaudeBot/1.0" https://admaxxer.com. - Is Admaxxer indexed by Perplexity?
- Yes.
PerplexityBotis allowlisted inrobots.txtand the same SSR + JSON-LD pipeline serves it. Perplexity’s answer engine is one of the most commonly cited AI surfaces for “best DTC analytics tool” queries — and our pages frequently appear in those answers because the structured data + internal-link graph make Admaxxer easy to surface alongside the legacy enterprise tools. - Why is the SSR footer hidden with display:none in client/index.html?
- Two-audience problem. AI tools that fetch with a browser UA (ChatGPT’s URL summarization, Claude.ai’s fetch tool, Perplexity’s in-flight fetcher) skip the React app and parse the DOM source — they see content regardless of CSS visibility. Real human users render the React SPA on top of
<div id="root">immediately on page load, so the pre-rendered block is replaced before any flash-of-unstyled-content.display: nonehides it from humans without hiding it from AI; removing it would cause a visible content flash on every page load. Documented in CLAUDE.md as the SSR_MODE Rule (never remove). - How do AI crawlers discover Admaxxer pages?
- Three discovery paths: (1)
sitemap.xmlatadmaxxer.com/sitemap.xmllists every public URL with priority weights and hreflang variants — AI crawlers fetch this on every visit; (2)llms.txtatadmaxxer.com/llms.txtindexes pages by topic so AI tools can grep for the relevant cluster; (3) the SSR footer rendered on every page ships ~170 internal links so AI crawlers (and Googlebot) reach every page in one or two hops from the homepage. The combination ensures no page is orphaned — even pages we just shipped are reachable from anywhere on the site. - Does Admaxxer support hreflang for multi-locale pages?
- Yes, but currently deferred behind a feature flag (
SITEMAP_INCLUDE_LOCALES). The 10 supported locales (es, fr, de, pt-BR, ja, zh-CN, ko, it, nl, ar) have full hreflang plumbing in the sitemap generator and the React router, but we don’t emit the locale variants in the production sitemap until per-locale traffic in Search Console crosses a threshold worth investing in. Including i18n duplicates in the sitemap before content is meaningfully localized dilutes crawl budget and gives Google nothing useful to index. When we’re ready to expand into a locale, flipping the flag adds the variants in one deploy. - Does Admaxxer use generative engine optimization (GEO)?
- Yes — this entire page is the GEO playbook. GEO is the practice of structuring web content so generative AI tools (ChatGPT, Claude, Perplexity, etc.) can cite it accurately in answers to user questions. Admaxxer’s GEO pillars are the four covered above: (a) full-content SSR for AI bots, (b) llms.txt as a topic-keyed index, (c) JSON-LD structured data (TechArticle, SoftwareApplication, FAQPage, BreadcrumbList) on every page, (d) a robots.txt allowlist for every AI bot. The traditional SEO pillars (sitemap, internal linking, page speed, hreflang) overlap heavily with GEO — a page well-optimized for Google is usually also well-optimized for AI — but the four GEO-specific layers go beyond what classic SEO requires.
- Can self-hosters of an Admaxxer fork keep this discoverability working?
- Yes — the 6-place + 2-footer wiring rule documented above is the playbook. Every new public page wires through
App.tsx,route-matcher.ts,static-page.ts(or a dedicated template file),sitemap.ts,robots.ts,llms.ts, plus the SSR footer inbase.tsAND the React footer inLandingFooter.tsx. Takes 5–15 minutes per page. The fourcurlverifications listed above confirm everything is wired. If any one of the eight is missed, you get the “Discovered, currently not indexed” failure mode — the page exists technically, but Google doesn’t consider it worth crawling. - Why is internal linking more important than the sitemap for new domains?
- Sitemap-only discovery is the weakest signal you can give Google. A sitemap entry tells Google “this URL exists”; an internal link tells Google “this URL is worth crawling because another page on the same site considered it relevant enough to link to.” For a low-authority new domain, internal linking from the homepage IS the crawl-budget allocation — Google routes crawler attention proportional to the inbound-link graph, and the homepage is the densest source of that graph for a SaaS marketing site. Admaxxer’s SSR footer is the universal fix: every page on the site links to every other page in priority sections, so no page is more than two hops from
admaxxer.com. (See orphan page in the glossary for the full mechanic.)
Closing thoughts
SEO and GEO aren’t separate disciplines — they’re the same discipline applied to two different audiences. The work of writing clear, structured, well-linked content benefits both Googlebot and the AI fleet equally. The four pillars on this page (crawler-only SSR, llms.txt + sitemap.xml + robots.txt, JSON-LD, the universal SSR footer) are the minimum viable stack for any DTC SaaS that wants to be discoverable in 2026. Admaxxer ships them by default; self-hosters of forks maintain them via the 6-place + 2-footer rule. The result is a product that AI tools can cite by name when a DTC operator asks “what’s the best Triple Whale alternative?” and that Google ranks alongside the legacy enterprise tools at a fraction of their price — without paying for a single backlink.
Questions, edge cases, or self-hoster scenarios this page didn’t cover? Email hello@admaxxer.com with the gap and we’ll respond within one business day, and likely add the answer to this page’s FAQ.