UNCLASSIFIED// FIELD REPORT// GEO-OPS REF UAP-WATCH // --:--:--Z
Generative Engine Optimization · Case Study

UAP.WATCH

How a declassified-document archive, engineered for the AI retrieval layer, earned ~10% of its traffic from ChatGPT, Claude & Perplexity — in one month.

00 — At a glance
~10%
Traffic from AI assistants (ChatGPT · Claude · Perplexity)
~95
Programmatic pSEO pages (+ hundreds of entity pages)
~30
AI crawlers explicitly allow-listed
134–167
Word "citable-passage" spec, per page
7
schema.org JSON-LD types on every page
<24h
Release → live, indexed inside the news cycle

The Pentagon dropped 162 declassified UAP files into a flat, unbrowsable directory during a live news cycle. I shipped a better archive the same day — then engineered it as a machine-readable, primary-source corpus built specifically for the way AI assistants retrieve and cite sources.

01 — What I did

SEO ranks you in a list. GEO makes you the answer.

Modern AI answers are retrieval-augmented: when someone asks Perplexity or ChatGPT a question, the system runs a live search, fetches a few pages at inference time, and writes its answer grounded in them. That fetch step is the lever. GEO is, in effect, SEO for the retrieval layer of LLMs — plus formatting your content the way a model likes to quote it.

So I built the site as a machine-readable primary-source archive and optimized every layer for both human search and LLM retrieval. Five things stacked together:

A

Wide programmatic surface

~95 programmatically generated long-tail pages (FAQ / wiki / compare) plus hundreds of auto-generated entity pages — one per incident, document, video, agency, year, US state. Each is generated from the structured catalog and grounded in a specific declassified document, targeting a real question someone asks an AI.

B

The "citable-passage" shape

Every answer is 134–167 words, definition-first, with a statistic + a verbatim quotation + a primary-source citation. A dedicated pull-quote per page gives the model a clean string to lift. (Straight from the Princeton GEO research.)

C

Structured data everywhere

The same JSON-LD vocabulary (FAQPage, NewsArticle, Dataset, Quotation, GeoCoordinates…) repeated across every page, so crawlers extract facts unambiguously instead of guessing from prose.

D

Machine-ingestion endpoints

An llms.txt index and a build-time llms-full.txt corpus (the whole site in one file), plus a robots.txt that explicitly allow-lists ~30 AI crawlers by name.

E

Fast, push-based discovery

A programmatic sitemap with fresh lastmod and an IndexNow ping on every deploy — so we were indexed in minutes, during the news cycle, not days later.

Wrapped around all of it

Primary-source grounding on every claim (links back to war.gov), a RAG chatbot grounded only in the corpus, and trend timing — shipping into spiking demand before any authoritative competitor existed.

02 — Why it worked

A combination, not one magic trick.

  1. Owned a fresh, low-competition topic. A brand-new release with spiking demand and almost no authoritative sources — I became one of the most complete within 24 hours.
  2. Was the cleanest thing to quote. Our pages were pre-formatted into the self-contained, sourced, statistic-bearing passages models prefer. We didn't just rank — we were liftable.
  3. Was trivially machine-ingestible. Server-rendered HTML (content in the first response, not behind JS), consistent JSON-LD, explicit crawler allow-list, one-file corpus. No reason for a crawler to skip or misparse us.
  4. Covered the long tail. Hundreds of specific question- and entity-pages meant that for a huge range of natural-language queries, some page of ours was the best match.
  5. Got discovered fast, trusted quickly. IndexNow + sitemap freshness got us indexed during the cycle; primary-source grounding made us safe to cite.
Honest attribution

I can't cleanly attribute the 10% to any single lever, and the big confound is that this was a viral news event — topical authority did a lot of work a normal client won't have. What does transfer is the on-page system. The way to prove it is to baseline first, then A/B the structured-data and content-shape changes and monitor which pages actually get cited.

03 — The proof

What we actually get cited for.

To pressure-test the result, I ran the site's target queries through live web search and AI synthesis — the same retrieve-then-summarise substrate that ChatGPT search, Perplexity, and Google's AI Overviews run on. A sharp pattern emerged: we win on differentiated-angle and brand queries, and lose on high-authority factual terms.

Cited · ranked #1
  • "interactive map of declassified Pentagon UAP incidents" — the #1 result, and the first source named in the AI's recommendation list, above competing mirror sites.
  • "UAP.WATCH interactive map" — #1, with the AI answer lifting our own page copy near-verbatim.
  • "uap watch" (branded) — present and accurately summarised.
Not cited · authority wins
  • Factual head terms — how many files were released, what PURSUE is, the GOFAST resolution, the green-fireball case, the "Eye of Sauron" orb.
  • Here war.gov, Wikipedia, the FBI Vault, and major news (TIME · CNN · NBC · CBS) dominate — even where we publish a dedicated page.
The pattern

The queries we own are recommendation-style ("interactive map of…") — the kind that actually sends qualified visitors — and they're won on a differentiated angle, not by competing head-on with the primary source and the news cycle. Closing the gap on factual terms is an off-site authority problem — earning links and mentions on sites the models already trust — which is the clearest place to invest next.

Method · target queries run through live web search + AI synthesis, June 2026 — a proxy for what AI assistants retrieve and cite. Directional, not a controlled measurement of any single engine.

04 — The lever menu

What I tried — in order of confidence.

Durable · high-confidence
  • Citable-passage content shape (stat + quote + citation)
  • JSON-LD structured data across all page types
  • Programmatic long-tail pages grounded in a real dataset
  • Build-time sitemap + IndexNow push on every deploy
  • Explicit AI-crawler allow-listing
Experimental · cheap bets
  • llms.txt / llms-full.txt — a proposed convention; shipped because near-zero cost & directionally right, not because I proved it's read
  • RAG chatbot grounded only in the corpus — part feature, part credibility / dwell-time signal
Deliberately avoided
  • Mass-dumping templated pages. Chose drip + source-grounded generation instead: pages generated from the catalog (questions, topics, incidents, agencies, US states, comparisons), each tied to a primary-source declassified document — not templated slop. Shipped in batches of 15–20, spread across days, timed to news moments — under Google's spam threshold. Scale the surface, not the slop.

Distribution as a GEO input: I seeded Reddit, Hacker News, and Twitter. That matters for GEO too — human distribution drives real traffic and inbound mentions, which makes crawlers discover and trust you faster. GEO isn't only on-page; being talked about on sites the models already trust is itself a citation lever.

05 — What I'd do better

Where the next iteration goes.

06 — Why this matters

The next search land-grab is already open.

Search is shifting from ten blue links to AI assistants that answer directly and cite a handful of sources. That's a new, under-contested surface where most small businesses are completely invisible — and being the cited answer is worth more than ranking #4 on Google.

It's an early-mover moment, the way SEO was in ~2010. A shop that can reliably get small businesses cited by AI is selling into a market that mostly doesn't yet know it has the problem. The wedge writes itself: a free "Is your business invisible to AI?" audit that shows an owner ChatGPT recommending a competitor for their core question — that gap is visceral, and it sells.

GEO is SEO for the retrieval layer — same discipline, new surface.

The winners make a business the cleanest, most trustworthy thing for a model to quote — and prove it with citation tracking.