How to Get Visibility on Google AI Overviews (and AI Mode)

10 min read

May 29, 2026 5:51:07 AM

Google AI Overviews and AI Mode now sit on top of more searches than ever, both powered — since January 2026 — by Gemini 3, with Gemini 3.5 Flash rolling in at I/O 2026 as the new default. The mechanics have moved on. The advice has not. Most of what you'll read about "ranking in AI Overviews" is recycled SEO playbook with a new label.

This guide is built on something narrower: peer-reviewed papers, sentence-level reverse-engineering of Google's own citation URLs, Google's published documentation and patents, and the largest independent citation analyses we could find. Every external statistic is anchor-linked to the primary source.

The short version: AI Overviews are not a re-ranking of organic results. They are the output of a separate retrieval-and-synthesis pipeline that fans your query out into sub-questions, pulls a budget of roughly 2,000 words of "grounding" content across multiple sources, and stitches answers from sentence-level fragments. Ranking #1 helps. It is no longer enough — and in plenty of verticals, it isn't even necessary.

How AI Overviews actually generate an answer

Google has been more transparent about this than most people realise. Its official Search Central documentation confirms the query fan-out technique used by both AI Overviews and AI Mode:

"Both AI Overviews and AI Mode may use a 'query fan-out' technique — issuing multiple related searches across subtopics and data sources — to develop a response."

The mechanics line up almost exactly with Google's own generative summaries patent (US11769017B1), filed in 2023 and granted in late 2024. The patent describes the pipeline in four moves: receive the query, select a set of search result documents responsive to it, process the snippets of those documents through an LLM, and generate the summary. Independent researchers have since stress-tested this stack and added significant new detail.

In particular, Dan Petrovic's December 2025 grounding chunks study at DEJAN analysed 7,060 queries with three or more sources and 883,262 grounding snippets. The findings put hard numbers on what was previously hand-waving:

Each query has a roughly 2,000-word grounding budget total, distributed across sources by relevance rank. The budget is "remarkably consistent regardless of how many sources are used or how long the individual pages are."
The average grounding chunk Google extracts from a page is 15.5 words.
Being the #1-ranked source earns roughly 2× the grounding allocation of being #5.
Cosine similarity between the extracted snippet and the full source page averages 0.916 — the snippet captures ~91.6% of the source's meaning.

In other words: Google isn't reading your page. It's clipping a roughly 15-word chunk out of it, and giving that chunk a fixed share of a tight word budget. Your job is to make sure the cleanest, most-quotable 15-word chunk for the sub-query Google is asking is sitting somewhere on your page where the model can find it.

Ranking still matters — just less, and unevenly

The biggest debate in the AI Overview literature in 2026 is whether organic rankings drive citations. Two of the strongest first-party studies look like they contradict each other. They don't, once you read them carefully.

Ahrefs analysed 863,000 SERPs and ~4 million AI Overview URLs in March 2026 and found that the overlap between AI Overview citations and the top-10 organic results had collapsed:

"Only 38% of pages cited in AI Overviews also rank in the top 10 — down from 76% in July 2025."

Meanwhile, BrightEdge's 16-month tracking study showed the opposite trend — overlap between AI Overview citations and organically ranking content rising from 32.3% in May 2024 to 54.5% in September 2025. And seoClarity's analysis of 432,000 keywords found that 97% of AI Overviews cite at least one source from the top 20 organic results, with position-1 pages appearing in AI Overviews more than half the time.

How can all three be true? Two reasons.

First, the question matters. "Do AI Overviews cite the top 10 for the original query?" (the Ahrefs framing) is a different question than "Does at least one citation per AI Overview come from the top 20?" (the seoClarity framing). The Ahrefs methodology drilled into source-by-source overlap after Gemini 3 dramatically expanded fan-out. The seoClarity methodology counted any overlap per AI Overview at all.

Second, AI Overviews are not uniform. BrightEdge's vertical breakdown is the most useful data point in the whole field: organic-AIO overlap in healthcare is 75.3%, in education 72.6%, in insurance 68.6% — but in online retail just 22.9%. Citation behaviour varies enormously by industry. A B2B SaaS playbook does not transfer to retail.

The defensible read across all three studies: ranking is necessary-but-not-sufficient in YMYL-adjacent verticals (health, finance, education), and increasingly optional in commercial and informational categories where Google is leaning heavily on fan-out queries to pull citations from pages that don't rank for the original term at all.

Sentence-level citation: the most under-discussed AIO mechanic

The most overlooked finding of the year comes from Daniel Shashko's reverse-engineering of 42,971 AI Mode citations in March 2026.

Shashko noticed that every Google AI Mode and Gemini citation URL contains a hidden #:~:text= fragment — the text-fragment URL standard — encoding the exact sentence Google pulled from the source page. By decoding those fragments at scale, he produced the first large-scale analysis of Google's citation selection at the sentence level rather than the page or domain level.

This matters because it confirms what Petrovic's grounding chunks data implied: AI Mode is not citing your page. It is citing one specific sentence on your page. The unit of optimisation is the sentence, not the post. Pages that get cited tend to contain a high density of crisp, self-contained, factually specific sentences that can be lifted whole into an answer.

Kevin Indig's embedding-based analysis of 21,000+ citations reinforces the same pattern from the other direction: he used semantic embeddings to identify which exact sentence the model was pulling from, and found that the top 30 domains capture roughly 67% of all AI citations for a given topic. The pages that get cited repeatedly aren't the most specific — they're broad, category-level guides that answer dozens of related sub-questions at once, each with its own quotable sentence.

The practical implication: you want passage-level density. If your page targets "how to onboard a new SDR", you want individual sentences that can be lifted out as standalone answers to: how long does SDR onboarding take, what's in week one, what's the first KPI, what books should they read, when do they take their first call. Each one a sentence the model can clip.

What predicts citation: the signals that hold up across studies

1. Topical depth, not topical narrowness

The Princeton, Georgia Tech and IIT Delhi paper GEO: Generative Engine Optimization (KDD '24) — still the only peer-reviewed framework for this discipline — tested content modifications across 10 generative search engines using 10,000 queries. The findings:

"Targeted content modification strategies — particularly Statistics Addition, Citing Sources, and Quotation Addition — can boost visibility in generative AI answers by up to 40%."

This holds for AI Overviews specifically. Surfer's analysis of 57,253 URLs across 1,591 keywords compared the information density of cited versus non-cited pages and found the pattern was consistent: pages cited in AI Overviews cover a substantially larger share of the topic's key facts than non-cited pages do.

Combined with Indig's finding that broad category guides outperform narrow single-intent pages, the takeaway is clear. Depth beats focus. A long, well-structured pillar page that answers fifteen sub-questions about a topic will outperform fifteen narrow pages targeting one sub-question each — because the pillar has fifteen opportunities to win a sentence-level citation, while each narrow page has only one.

2. Structured data and schema do real work

Schema markup is one of the few areas where classic SEO hygiene maps directly onto AI Overview visibility. The Digital Applied study of 1,000 AI Overviews put it bluntly:

"Schema markup is the cheapest 2.3× lever. Body-level named sources add another 2.1×. Long-form (over 2,500 words) adds 1.6×."

Article, BreadcrumbList and HowTo schema all correlated positively with citation rate. The mechanism is the same one Petrovic's grounding-chunks data points to: schema produces clean, extractable answer blocks that pass the model's snippet-selection step.

The same study found AI Overviews include an average of 4.2 citations per answer, ranging from 2 to 9. Commercial-intent overviews skew lower (3.1 average), definitional and how-to overviews higher (5.6). The smaller the citation set, the harder the slot to win.

3. YouTube is doing more work in AIO than anyone expects

Ahrefs' March 2026 study contains a finding that surprised even them: among the AI Overview cited pages that didn't rank in Google's top 100 for the original query, 18.2% were YouTube URLs. Across the full dataset, YouTube made up 5.6% of all AI Overview citations.

YouTube is now the most-cited domain in AI Overviews, with citation share grown 34% over six months. And in Ahrefs' separate analysis of 75,000 brands, YouTube mentions in video titles, transcripts and descriptions correlated more strongly with AI Overview visibility (Spearman 0.737) than any other factor tested — stronger than brand web mentions, stronger than backlinks, stronger than domain authority.

The mechanism is roughly: Google owns YouTube, treats transcripts as searchable text, and the fan-out process pulls related YouTube content even when no YouTube URL ranks for the original query. If your category has YouTube creators talking about it, your visibility in AIO is partially decided by what those creators say.

4. Brand mentions still outweigh backlinks

The same 75K-brand study found brand web mentions correlated with AI Overview visibility at 0.664, against just 0.218 for backlinks — roughly a three-to-one gap. Branded anchor text and branded search volume also outperformed traditional authority metrics. AI Mode, in particular, behaves as a consensus engine, rewarding brands that the web is already talking about. If your category has discourse about your competitors but not about you, AI Mode will mirror that gap.

5. Freshness matters at the margin — but less than for ChatGPT

Two findings here, both important to hold side by side.

Ahrefs' 17-million-citation freshness analysis found that AI Overviews are the least freshness-biased of all AI surfaces, with the average cited URL 16 days older than the average organic SERP result. Median cited-page age: roughly 3.9 years.

But the Digital Applied study qualified this — the median cited page age in their AIO dataset was 14 months, and recency mattered "less than expected" but was still a tiebreaker. The honest synthesis: AIO is not freshness-biased the way Google News rankings are. Stale evergreen pages can and do get cited. But maintaining your top pages is still high-ROI maintenance work, and YMYL and time-sensitive verticals tilt much more sharply toward fresh content.

What gets oversold in the AIO discourse

Three things you will see hyped where the data doesn't support the hype.

Pure word count. "Write 3,000-word pages and you'll get cited" is a misreading of the Digital Applied 1.6× finding. Long-form helps because long pages contain more quotable sentences and more topical sub-coverage — not because Google is rewarding length itself. A 1,500-word page with high fact density will outperform a 4,000-word page that pads to hit a word target.

llms.txt and other "AI hygiene" files. Google has publicly stated they do not support llms.txt. It costs nothing to publish and might help at the margins for brand disambiguation, but it is not a meaningful lever for AIO citation. Treat as low-priority hygiene.

FAQ schema as a magic bullet. FAQ schema helps — the Digital Applied study confirms it — but only because it produces well-structured answer blocks. If your FAQ answers are weak or generic, the schema can't save them. Schema is amplification, not creation.

What this means you should actually do

Six moves, all justified by the evidence above.

1. Optimise the sentence, not the page

Given Shashko's text-fragment finding and Petrovic's 15.5-word grounding chunks, every page targeting AIO visibility should be auditable at the sentence level. For each major sub-question, there should be one sentence on the page that answers it directly, contains the entity, and can be quoted in isolation. If you read the sentence aloud out of context and it still makes sense, you've done it right.

2. Build pillar pages that answer twenty sub-questions, not twenty pages that answer one each

Indig's 21K-citation analysis and Google's own fan-out documentation both push you in the same direction: comprehensive pillar pages get cited across many fan-out queries, narrow pages get cited (if at all) for one. If you have a topic with 20 logical sub-queries, build a single pillar that answers all of them with clear headings, and let internal linking and supporting deep-dives carry the rest of your SEO.

3. Front-load every sub-question with a direct answer

Google's grounding pipeline extracts chunks of ~15 words. Every H2 or H3 on your page should be a question or a topic. The first sentence underneath should be the direct answer in plain prose — entity, answer, qualifier. Save the build-up for the second paragraph. This is the single highest-ROI structural change most B2B content teams haven't made yet.

4. Add the three things that the Princeton paper proved move citations

Statistics. Source citations. Direct quotes. The KDD '24 paper put the combined effect at up to a 40% lift across generative engines. Almost no other tactic has comparable peer-reviewed evidence. If you do one optimisation pass on existing pages this quarter, do this one.

5. Take YouTube seriously

The 0.737 correlation isn't accidental and it isn't an artefact of Google owning YouTube — it shows up in ChatGPT visibility too. If your category has video creators talking about it, you want to be in those conversations: through your own channel, through partnerships, through being the brand creators reach for when they need an example. Transcripts get indexed; descriptions get indexed; both feed AIO.

6. Build for vertical context, not generic best practice

The BrightEdge vertical data is the most underused finding in the whole discourse. If you're in health, education or insurance, organic rank is doing two-thirds or more of the work — protect and grow your top-10 positions. If you're in retail, B2B SaaS or category-discovery content, organic rank is doing a fifth — invest in pillar depth, YouTube, and brand mentions before you spend more on link building. The same playbook doesn't fit both.

The honest summary

AI Overviews and AI Mode are running a fan-out, grounding-budgeted, sentence-level citation pipeline. Pages get into AI answers by:

containing the cleanest 15-word answer to one of the sub-queries Google generated;
being part of a broad, deep page that answers many adjacent sub-queries;
presenting that answer with statistics, sources and quotes that pass the model's snippet-extraction step;
belonging to a brand the web is already discussing — including on YouTube;
and, in YMYL verticals, ranking organically in the top 10.

What stops working: chasing single-keyword rank, padding word counts, relying on backlinks alone, building flat catalogues of narrow posts, and assuming what works in one vertical works in another.

The good news for teams willing to take the data seriously: the field is still young enough that consistent, evidence-led work compounds. The teams winning AIO visibility in May 2026 aren't doing twelve different things. They're doing five of the right things, every quarter, in the right order.