How to Get Visibility on ChatGPT in 2026

10 min read

May 29, 2026 5:41:25 AM

There is no shortage of opinions on how to "rank in ChatGPT." There is, finally, real data. Over the last twelve months, Ahrefs, Princeton, Cyrus Shepard's Zyppy, AirOps, Authoritas and 5W have all published first-party studies analysing hundreds of millions of citation events, billions of prompts, and tens of thousands of brand mentions across ChatGPT, Google AI Overviews, Perplexity, Claude and Gemini.

The picture that emerges is sharper — and weirder — than most "GEO" advice you'll read. ChatGPT does not pick sources the way Google ranks pages. It runs a two-stage retrieval-and-citation pipeline that throws away most of what it reads, leans hard on a small set of "trusted" domains, and rewards content that is structurally legible to a machine before it rewards content that is well-written.

This guide walks through what the data says, where the evidence is mixed, and the moves that hold up across every credible study we found. Every external statistic links back to its primary source.

How ChatGPT actually decides what to cite

Most marketers picture ChatGPT search as "Google with a chat interface." It isn't. According to the Ahrefs analysis of 1.4 million ChatGPT 5.2 prompts, every answer is the output of a two-stage funnel:

Retrieval. ChatGPT fans the user's question out into multiple sub-queries, runs them through a search index (a blend of Bing's index and OpenAI's own crawled corpus), and pulls back dozens of candidate URLs. Each candidate arrives with a title, an optional snippet, a URL and an ID number.
Citation. The model reads (or skims) a subset of those URLs and decides which to actually reference in its answer. Ahrefs found that only ~50% of retrieved pages get cited, and AirOps' analysis of 548,534 pages across 15,000 prompts puts the number even lower — only about 15% of retrieved pages make it into the final answer.

That gap matters. It means the most important moment in your visibility isn't whether ChatGPT reads you. It's whether you survive the second filter — the moment where the model is staring at a dozen candidates and choosing five.

The Ahrefs team mapped a few things that influence that survival rate:

URL and title alignment with sub-queries. Pages whose URLs and titles cleanly matched ChatGPT's narrower sub-questions outperformed pages that only matched the broad original prompt. Descriptive URL slugs were cited about 89.78% of the time they were retrieved, compared with 81.11% for less descriptive URLs.
Source type. General web pages were cited at the highest rate. Reddit was retrieved heavily but cited just 1.93% of the time — meaning ChatGPT is mining Reddit for context and consensus but rarely giving Reddit the link.
Snippet and date fields don't behave as you'd expect. Counter-intuitively, non-cited URLs in the Ahrefs dataset had populated snippets and publication dates more often than cited URLs. The Ahrefs authors concluded this was a compositional artefact — driven mostly by Reddit, which inflates the non-cited bucket — rather than evidence that snippets hurt you.

The honest takeaway: ChatGPT's citation decision is shaped less by traditional ranking signals and more by how legible your page is to the retrieval layer in the first three seconds it spends on you.

OpenAI's own crawler tells you what they want

OpenAI publishes a list of its crawlers and explicit guidance for publishers. The bot you care about is OAI-SearchBot, the crawler that powers ChatGPT search results. It is separate from GPTBot (which feeds model training) and ChatGPT-User (which fetches a page on demand when a user clicks a link).

OpenAI's publisher FAQ is unusually direct about what they're looking for:

"Inclusion is algorithmic, and the best way to improve your chances is to ensure your site is crawlable, your product data is structured and up-to-date."

In other words: the basic hygiene matters. If your robots.txt is silently blocking OAI-SearchBot, none of the downstream tactics in this guide can help you. Check it first. (Ahrefs' ~140M-website analysis found that AI bot block rates have risen sharply over the past year — and a meaningful share of those blocks are unintentional.)

Beyond crawl access, OpenAI signals that recency, structured data and clear authorship matter. ChatGPT now appends utm_source=chatgpt.com to referral traffic, so you can — and should — segment AI-driven sessions in GA4 and start measuring before you optimise.

The signals that actually correlate with ChatGPT citations

This is where the data gets interesting. Across the credible studies published in the last six months, the same handful of signals show up again and again — but with very different weights than in classic SEO.

Brand mentions are now the dominant off-site signal

The single largest study on this question is Ahrefs' analysis of 75,000 brands across ChatGPT, AI Mode and AI Overviews. The headline finding:

"Brand web mentions show the strongest correlation (0.664) with AI Overview brand visibility… backlinks correlate at 0.218 — a three-to-one gap that held across ChatGPT, Google AI Mode, and AI Overviews."

The same dataset found that YouTube mentions correlated with AI visibility at 0.737 — the strongest single predictor in the study. And brands in the top quartile of brand mentions earned up to 10x more AI mentions than brands in the next quartile.

Translation: a Forbes mention, a podcast appearance, a YouTube review, a Reddit thread where a stranger recommends you — these now matter more than a clean backlink profile. Backlinks still help (they're a proxy for "real brand"), but they are no longer the main lever.

Wikipedia and Reddit are doing the heavy lifting

5W's Citation Source Audit Q1 2026, based on roughly 600,000 citation events, found:

Wikipedia accounts for 13.15% and Reddit 11.97% of all U.S. ChatGPT citations — together, over a quarter of everything ChatGPT cites.
The Wall Street Journal, New York Times and Bloomberg do not appear in the top 20.
LinkedIn climbed from #11 to #5 in three months, now cited in 14.3% of ChatGPT Search responses.

Ahrefs separately found that roughly 67% of ChatGPT's top 1,000 most-cited pages are "dead citations" — Wikipedia entries, official homepages, app store listings — that brands cannot directly influence through outreach.

Two practical implications follow. First, a defensible Wikipedia presence (where your brand qualifies under notability rules) is now disproportionately valuable. Second, the Reddit picture is more nuanced than it looks: ChatGPT reads enormous amounts of Reddit during retrieval but rarely cites it, so the right Reddit strategy is to shape the consensus the model is absorbing, not to chase Reddit links for credit.

Freshness — but not the way most people pitch it

Ahrefs' analysis of 17 million AI citations found that AI-cited URLs are on average 25.7% fresher than the URLs that rank in Google's organic results — a median of 1,064 days versus 1,432 days. ChatGPT and Perplexity also order their in-text citations from newest to oldest, suggesting recency is an explicit ranking input.

But the same study makes the more important point: the median age of a cited page is still 2.9 years. AI doesn't reward "published yesterday." It rewards recently maintained authority — content that has built up signals over time and is then visibly updated.

In practice this means treating updates as a tiered system — light refresh (swap in current stats, tighten the lede), content update (rewrite sections where the SERP or facts have shifted), or full rewrite (when intent has changed). You're not chasing a freshness signal; you're protecting the authority signals you've already earned from decay.

Position-on-page matters more than total length

Cyrus Shepard's Zyppy analysis of thousands of ChatGPT citations (summarised in Search Engine Land) produced one of the most actionable findings of the year:

44.2% of all LLM citations come from the first 30% of a page (the intro, TL;DR and first major section).
The middle 30–70% contributes 31.1%.
The final 30% contributes just 24.7%.

Shepard nicknamed this the "ski-ramp" — attention is highest at the top and drops off sharply. ChatGPT chunks pages into passages before scoring them, and the early passages get a disproportionate share of the citation weight.

The implication is unambiguous: front-load the answer. Lede sentences should contain the entity, the answer, and the qualifier. Save the long build-ups for a different medium.

Structured data does, in fact, move the needle

Authoritas' analysis (reported across SE Ranking and AirOps studies) found:

Pages with FAQ schema are cited approximately 40% more often by ChatGPT.
71% of ChatGPT-cited pages include structured data of some kind.
Pages with three or more schema types had a 13% higher citation probability than pages with one or none.

This is one of the few areas where classic SEO hygiene maps almost directly to AI visibility. Article schema, FAQ schema, Organization schema, and product schema all do meaningful work — not because ChatGPT "reads" schema as a ranking signal, but because schema produces clean, extractable answer blocks that pass the legibility test in stage two.

What the academic research adds

The most cited academic paper in this space is still the Princeton, Georgia Tech and IIT Delhi GEO paper presented at KDD '24. The authors built GEO-bench — 10,000 queries across nine domains — and tested content modifications across multiple generative search engines. Their result:

"Targeted content modification strategies — particularly Statistics Addition, Citing Sources and Quotation Addition — can boost visibility in generative AI answers by up to 40%."

This is one of very few peer-reviewed, methodologically rigorous data points in this entire conversation. The three winning tactics (cite sources, add statistics, add direct quotes) are now baked into every credible GEO playbook for a reason: they are what the model is statistically more likely to pull out as an extractable, attributable chunk.

The signals that get oversold

Three things you'll see hyped that the data does not support as strongly as people claim:

llms.txt. Otterly's GEO experiment and several follow-up tests found no consistent measurable benefit to AI visibility from publishing an llms.txt file. Google has publicly stated they do not support it. Anecdotal evidence is mildly positive for brand disambiguation. Our read: llms.txt is cheap, takes 30 minutes, and is worth doing — but don't expect it to be the lever that moves your visibility number.

Backlinks as the primary lever. They still matter. They do not matter the way they used to. As above, the Ahrefs 75K-brand study put the brand-mention-to-backlink correlation ratio at roughly 3:1 for AI visibility. Time and budget that used to go to link building is, on the current evidence, better spent on earning unlinked brand mentions across the sources ChatGPT trusts.

Pure publishing velocity. A pile of new posts won't save you if they're not maintained. The pattern is consistent across every large publisher case study we've reviewed: a content catalogue scales fast, hits a traffic peak, then erodes once the back catalogue ages out of relevance. Velocity has to be paired with a maintenance system, or it eats its own returns.

What this actually means you should do

If you take the studies above seriously, the playbook for ChatGPT visibility looks like this. It is not the same playbook as classic SEO, and it isn't most "GEO" advice on LinkedIn either.

1. Make sure ChatGPT can actually crawl and read you

Check your robots.txt for OAI-SearchBot, GPTBot and ChatGPT-User. Ensure your important pages render server-side or have meaningful HTML for non-JS crawlers. Add Article, FAQ, Organization and Product schema where it fits naturally. Publish an llms.txt with brand disambiguation, audience definition, and links to your most authoritative pages — not because it's a magic bullet, but because it costs nothing and probably helps at the margin.

2. Front-load every page for the ski-ramp

Given Zyppy's 44% finding, the first paragraph of every page that targets a citation-worthy query should contain: the entity, the direct answer in one sentence, and the qualifier (who it's for, when it applies, what the trade-off is). Headings should be questions; the sentence under each heading should be the answer. This is not a content style choice anymore; it is how the retrieval layer reads you.

3. Add the three things the Princeton study proved move citations

Statistics. Source citations. Direct quotes. The KDD '24 paper put their combined effect at up to a 40% lift in generative engine visibility. Almost no other tactic has comparable peer-reviewed evidence behind it. If you do one optimisation pass on existing content this quarter, do this one.

4. Earn brand mentions in the places ChatGPT trusts

The 5W and Ahrefs data converge on the same short list of sources: Wikipedia (where appropriate and notable), Reddit (through participation and real consensus-shaping, not link drops), LinkedIn (long-form posts cited in 14.3% of responses), YouTube (the single strongest visibility correlate), and high-authority editorial coverage. The mention does not have to be linked. ChatGPT counts unlinked brand mentions; classic SEO tools largely don't. That gap is your opportunity.

5. Build a content maintenance system, not a publishing treadmill

Set a tiered refresh cadence — light refresh, partial rewrite, full rewrite — and assign every authority page a review interval before it decays. The Ahrefs freshness data tells you that recently maintained authority beats both old-and-static and brand-new. Maintenance is not glamorous and it is the highest-ROI work most B2B content teams aren't doing.

6. Measure with the assumption that the data is messy

Set up an "AI Search" channel in GA4 grouping ChatGPT, Perplexity, Claude and the others by referral domain. Treat per-prompt visibility data from tools like Profound, Otterly, Ahrefs Brand Radar and Semrush AI Visibility Index as directional — repeated sampling and trend lines are what's signal, not single-prompt snapshots. Pair topical prompts ("best CRM for Series B") with evaluation prompts ("evaluate [your brand] on [criterion]") so you're measuring perception, not just presence.

The honest summary

The clean version of the story: ChatGPT is running a two-stage funnel, citing roughly half the pages it retrieves, weighing brand mentions over backlinks by about 3:1, pulling 44% of citations from the first third of a page, preferring recently maintained authority over either stale or brand-new content, leaning heavily on Wikipedia and Reddit for its source mix, and rewarding pages that include statistics, source citations and direct quotes by up to 40%.

The messy version: the field is moving fast, every major study has caveats, attribution is genuinely hard, and the practitioners who feel confident in their AEO strategy are still a small minority of the marketing function — which means there is still real room to win for teams that move on the evidence we do have.

The teams getting cited by ChatGPT in May 2026 aren't the ones with the prettiest content. They're the ones whose content the retrieval layer can read, whose brand the off-site web has agreed exists, and whose answer is in the first paragraph instead of buried four scrolls down.

That's the work. Most of it is unsexy. Almost all of it is doable this quarter.