Generative Engine Optimization · Case Study

UAP.WATCH

How a declassified-document archive, engineered for the AI retrieval layer, earned ~10% of its traffic from ChatGPT, Claude & Perplexity — in one month.

Prepared by Bharat Jaju· Build Solo · <24h to v1· Stack Next.js 16 · Vercel

▶ View live site ◰ Source / GitHub

00 — At a glance

~10%

Traffic from AI assistants (ChatGPT · Claude · Perplexity)

~95

Programmatic pSEO pages (+ hundreds of entity pages)

~30

AI crawlers explicitly allow-listed

134–167

Word "citable-passage" spec, per page

schema.org JSON-LD types on every page

<24h

Release → live, indexed inside the news cycle

The Pentagon dropped 162 declassified UAP files into a flat, unbrowsable directory during a live news cycle. I shipped a better archive the same day — then engineered it as a machine-readable, primary-source corpus built specifically for the way AI assistants retrieve and cite sources.

01 — What I did

SEO ranks you in a list. GEO makes you the answer.

Modern AI answers are retrieval-augmented: when someone asks Perplexity or ChatGPT a question, the system runs a live search, fetches a few pages at inference time, and writes its answer grounded in them. That fetch step is the lever. GEO is, in effect, SEO for the retrieval layer of LLMs — plus formatting your content the way a model likes to quote it.

So I built the site as a machine-readable primary-source archive and optimized every layer for both human search and LLM retrieval. Five things stacked together:

Wide programmatic surface

~95 programmatically generated long-tail pages (FAQ / wiki / compare) plus hundreds of auto-generated entity pages — one per incident, document, video, agency, year, US state. Each is generated from the structured catalog and grounded in a specific declassified document, targeting a real question someone asks an AI.

The "citable-passage" shape

Every answer is 134–167 words, definition-first, with a statistic + a verbatim quotation + a primary-source citation. A dedicated pull-quote per page gives the model a clean string to lift. (Straight from the Princeton GEO research.)

Structured data everywhere

The same JSON-LD vocabulary (FAQPage, NewsArticle, Dataset, Quotation, GeoCoordinates…) repeated across every page, so crawlers extract facts unambiguously instead of guessing from prose.

Machine-ingestion endpoints

An llms.txt index and a build-time llms-full.txt corpus (the whole site in one file), plus a robots.txt that explicitly allow-lists ~30 AI crawlers by name.

Fast, push-based discovery

A programmatic sitemap with fresh lastmod and an IndexNow ping on every deploy — so we were indexed in minutes, during the news cycle, not days later.

Wrapped around all of it

Primary-source grounding on every claim (links back to war.gov), a RAG chatbot grounded only in the corpus, and trend timing — shipping into spiking demand before any authoritative competitor existed.

02 — Why it worked

A combination, not one magic trick.

Owned a fresh, low-competition topic. A brand-new release with spiking demand and almost no authoritative sources — I became one of the most complete within 24 hours.
Was the cleanest thing to quote. Our pages were pre-formatted into the self-contained, sourced, statistic-bearing passages models prefer. We didn't just rank — we were liftable.
Was trivially machine-ingestible. Server-rendered HTML (content in the first response, not behind JS), consistent JSON-LD, explicit crawler allow-list, one-file corpus. No reason for a crawler to skip or misparse us.
Covered the long tail. Hundreds of specific question- and entity-pages meant that for a huge range of natural-language queries, some page of ours was the best match.
Got discovered fast, trusted quickly. IndexNow + sitemap freshness got us indexed during the cycle; primary-source grounding made us safe to cite.

Honest attribution

I can't cleanly attribute the 10% to any single lever, and the big confound is that this was a viral news event — topical authority did a lot of work a normal client won't have. What does transfer is the on-page system. The way to prove it is to baseline first, then A/B the structured-data and content-shape changes and monitor which pages actually get cited.

03 — The proof

What we actually get cited for.

To pressure-test the result, I ran the site's target queries through live web search and AI synthesis — the same retrieve-then-summarise substrate that ChatGPT search, Perplexity, and Google's AI Overviews run on. A sharp pattern emerged: we win on differentiated-angle and brand queries, and lose on high-authority factual terms.

Cited · ranked #1

"interactive map of declassified Pentagon UAP incidents" — the #1 result, and the first source named in the AI's recommendation list, above competing mirror sites.
"UAP.WATCH interactive map" — #1, with the AI answer lifting our own page copy near-verbatim.
"uap watch" (branded) — present and accurately summarised.

Not cited · authority wins

Factual head terms — how many files were released, what PURSUE is, the GOFAST resolution, the green-fireball case, the "Eye of Sauron" orb.
Here war.gov, Wikipedia, the FBI Vault, and major news (TIME · CNN · NBC · CBS) dominate — even where we publish a dedicated page.

The pattern

The queries we own are recommendation-style ("interactive map of…") — the kind that actually sends qualified visitors — and they're won on a differentiated angle, not by competing head-on with the primary source and the news cycle. Closing the gap on factual terms is an off-site authority problem — earning links and mentions on sites the models already trust — which is the clearest place to invest next.

Method · target queries run through live web search + AI synthesis, June 2026 — a proxy for what AI assistants retrieve and cite. Directional, not a controlled measurement of any single engine.

04 — The lever menu

What I tried — in order of confidence.

Durable · high-confidence

Citable-passage content shape (stat + quote + citation)
JSON-LD structured data across all page types
Programmatic long-tail pages grounded in a real dataset
Build-time sitemap + IndexNow push on every deploy
Explicit AI-crawler allow-listing

Experimental · cheap bets

llms.txt / llms-full.txt — a proposed convention; shipped because near-zero cost & directionally right, not because I proved it's read
RAG chatbot grounded only in the corpus — part feature, part credibility / dwell-time signal

Deliberately avoided

Mass-dumping templated pages. Chose drip + source-grounded generation instead: pages generated from the catalog (questions, topics, incidents, agencies, US states, comparisons), each tied to a primary-source declassified document — not templated slop. Shipped in batches of 15–20, spread across days, timed to news moments — under Google's spam threshold. Scale the surface, not the slop.

Distribution as a GEO input: I seeded Reddit, Hacker News, and Twitter. That matters for GEO too — human distribution drives real traffic and inbound mentions, which makes crawlers discover and trust you faster. GEO isn't only on-page; being talked about on sites the models already trust is itself a citation lever.

05 — What I'd do better

Where the next iteration goes.

Measurement from day one. No A/B on structured-data or content-shape changes, no tracking of which pages got cited. I'd instrument that before touching content.
Direct citation monitoring. I tracked referral traffic but never queried the assistants to see whether/how they actually quoted us, and for which questions.
Off-site authority. Nearly all signals were first-party. I'd chase third-party mentions and links on sites models already trust — an under-pulled lever.
Name the speculative parts. llms.txt is a bet, not a proven channel. Knowing the difference between established and emerging tactics is part of doing this credibly.

06 — Why this matters

The next search land-grab is already open.

Search is shifting from ten blue links to AI assistants that answer directly and cite a handful of sources. That's a new, under-contested surface where most small businesses are completely invisible — and being the cited answer is worth more than ranking #4 on Google.

It's an early-mover moment, the way SEO was in ~2010. A shop that can reliably get small businesses cited by AI is selling into a market that mostly doesn't yet know it has the problem. The wedge writes itself: a free "Is your business invisible to AI?" audit that shows an owner ChatGPT recommending a competitor for their core question — that gap is visceral, and it sells.

“

GEO is SEO for the retrieval layer — same discipline, new surface.

The winners make a business the cleanest, most trustworthy thing for a model to quote — and prove it with citation tracking.