How a declassified-document archive, engineered for the AI retrieval layer, earned ~10% of its traffic from ChatGPT, Claude & Perplexity — in one month.
The Pentagon dropped 162 declassified UAP files into a flat, unbrowsable directory during a live news cycle. I shipped a better archive the same day — then engineered it as a machine-readable, primary-source corpus built specifically for the way AI assistants retrieve and cite sources.
Modern AI answers are retrieval-augmented: when someone asks Perplexity or ChatGPT a question, the system runs a live search, fetches a few pages at inference time, and writes its answer grounded in them. That fetch step is the lever. GEO is, in effect, SEO for the retrieval layer of LLMs — plus formatting your content the way a model likes to quote it.
So I built the site as a machine-readable primary-source archive and optimized every layer for both human search and LLM retrieval. Five things stacked together:
~95 programmatically generated long-tail pages (FAQ / wiki / compare) plus hundreds of auto-generated entity pages — one per incident, document, video, agency, year, US state. Each is generated from the structured catalog and grounded in a specific declassified document, targeting a real question someone asks an AI.
Every answer is 134–167 words, definition-first, with a statistic + a verbatim quotation + a primary-source citation. A dedicated pull-quote per page gives the model a clean string to lift. (Straight from the Princeton GEO research.)
The same JSON-LD vocabulary (FAQPage, NewsArticle, Dataset, Quotation, GeoCoordinates…) repeated across every page, so crawlers extract facts unambiguously instead of guessing from prose.
An llms.txt index and a build-time llms-full.txt corpus (the whole site in one file), plus a robots.txt that explicitly allow-lists ~30 AI crawlers by name.
A programmatic sitemap with fresh lastmod and an IndexNow ping on every deploy — so we were indexed in minutes, during the news cycle, not days later.
Primary-source grounding on every claim (links back to war.gov), a RAG chatbot grounded only in the corpus, and trend timing — shipping into spiking demand before any authoritative competitor existed.
I can't cleanly attribute the 10% to any single lever, and the big confound is that this was a viral news event — topical authority did a lot of work a normal client won't have. What does transfer is the on-page system. The way to prove it is to baseline first, then A/B the structured-data and content-shape changes and monitor which pages actually get cited.
To pressure-test the result, I ran the site's target queries through live web search and AI synthesis — the same retrieve-then-summarise substrate that ChatGPT search, Perplexity, and Google's AI Overviews run on. A sharp pattern emerged: we win on differentiated-angle and brand queries, and lose on high-authority factual terms.
The queries we own are recommendation-style ("interactive map of…") — the kind that actually sends qualified visitors — and they're won on a differentiated angle, not by competing head-on with the primary source and the news cycle. Closing the gap on factual terms is an off-site authority problem — earning links and mentions on sites the models already trust — which is the clearest place to invest next.
Method · target queries run through live web search + AI synthesis, June 2026 — a proxy for what AI assistants retrieve and cite. Directional, not a controlled measurement of any single engine.
llms.txt / llms-full.txt — a proposed convention; shipped because near-zero cost & directionally right, not because I proved it's readDistribution as a GEO input: I seeded Reddit, Hacker News, and Twitter. That matters for GEO too — human distribution drives real traffic and inbound mentions, which makes crawlers discover and trust you faster. GEO isn't only on-page; being talked about on sites the models already trust is itself a citation lever.
llms.txt is a bet, not a proven channel. Knowing the difference between established and emerging tactics is part of doing this credibly.Search is shifting from ten blue links to AI assistants that answer directly and cite a handful of sources. That's a new, under-contested surface where most small businesses are completely invisible — and being the cited answer is worth more than ranking #4 on Google.
It's an early-mover moment, the way SEO was in ~2010. A shop that can reliably get small businesses cited by AI is selling into a market that mostly doesn't yet know it has the problem. The wedge writes itself: a free "Is your business invisible to AI?" audit that shows an owner ChatGPT recommending a competitor for their core question — that gap is visceral, and it sells.
GEO is SEO for the retrieval layer — same discipline, new surface.
The winners make a business the cleanest, most trustworthy thing for a model to quote — and prove it with citation tracking.