Citation co-occurrence in AI Overviews: which domains Google cites together in the same answer card, and what that pattern reveals in 2026

**TL;DR** — Across 26 client sites in late April and the first three weeks of May 2026 we audited which domains Google's AI Overview composer tends to cite together inside the same answer card — citation co-occurrence rather than citation count. Across 6,180 multi-chip answer cards the headline was sharper than we expected: the composer is not picking 3–4 independent best-of-class sources per card; it is picking a primary editorial source plus 1–3 supporting sources, and the supporting sources cluster heavily by source type. Across the cohort, 64% of cards paired an editorial primary with a data/research support (industry report, statistics page, original-data study); 21% paired an editorial primary with a community/forum support (Reddit, Stack Exchange, Quora); 11% paired an editorial primary with an official/documentation support (vendor docs, regulator pages, .gov / .edu); 4% were three editorial sources with no support category. Inside the data-support population, the same 18 data domains appeared in 71% of cards — a tight oligopoly that determines who you co-cite with whether you intended it or not. Two structural changes — adding a data-paragraph to editorial pages that lacked one, and writing pages so that they could plausibly fill the support slot on adjacent queries — lifted co-citation count by 38% on the affected sites over a 30-day follow-up window, with most of the lift coming from cards where the client's page had never appeared before.

Why we ran this audit

For most of the past year the standing assumption in AI Overview reporting has been independence: each citation chip in a card is a separately-judged source, and being one of the cited domains means you won a head-to-head ranking against every other potential source. The behaviour we kept observing in client cards contradicted that assumption. The same combinations of domains were appearing together across very different queries, and the combinations had structural logic — a how-to article from a SaaS blog paired with a statistics page from an industry analyst, paired with a developer documentation page from the vendor whose product the article was about. The pattern looked too consistent to be coincidence; we wanted to measure whether co-citation was a real signal or whether we were pattern-matching against a small sample.

The second motivation was about competitive positioning. If the composer is picking a primary editorial source plus a support source from a different category, then "the citation list" is not a single ranking with one winner per slot — it is a small ensemble of complementary sources, and breaking into a card means understanding which slot is open. A client whose page reads like a how-to article is competing for the primary editorial slot; a client whose page reads like a data study is competing for the support slot; and the editorial work on those two pages is different. We wanted to put a number on the ensemble structure so the editorial brief could be written for the actual slot rather than for a generic "be cited" target.

How we ran the measurement

26 client sites — 10 SaaS, 7 publisher, 5 DTC, 4 B2B services — and for each site a fixed 200-query basket. We captured every multi-chip AI Overview answer card on each query, twice daily, across late April and the first three weeks of May 2026. For each card we logged the complete ordered citation list and classified each cited domain into one of five source-type buckets: editorial (blog posts, articles, guides), data (statistics pages, original-research reports, industry analyses), community (Reddit, Stack Exchange, Quora, vertical-specific forums), official (vendor documentation, regulator and government sites, academic pages), or commerce (product pages, marketplace listings). Classification was done by reading the domain plus the page template — a /blog/ subdirectory was usually editorial, a /reports/ or /research/ subdirectory was usually data, and the source-type call was reviewed by a second analyst on every ambiguous case. The full multi-chip cohort came to 6,180 events.

Two normalisation moves matter for reading the numbers below. We excluded cards where all cited domains were on the same source type, because the co-occurrence question is structurally about pairings — a card with three editorial sources and no support category is a different population from a card with one editorial and two supports. Those single-type cards were 14% of the multi-chip population and behaved differently; the reported pairing percentages are for the 86% mixed-type cohort, which is the steady state on the queries we audit. We also excluded cards where one of the cited sources was a federation page (a top-level "best of" aggregator that itself linked out to subordinate sources), because federation pages occupy an ambiguous source-type slot that distorts the cluster reading.

The shape of the co-citation pattern

The flat headline first. Across the 86% mixed-type multi-chip cohort, the dominant pairing pattern was editorial-primary plus a single supporting source from a different type. 64% of cards paired an editorial primary with a data/research support — a how-to article cited alongside a statistics page that quantified the problem the article was solving. 21% paired an editorial primary with a community/forum support — an instructional article cited alongside a Reddit or Stack Exchange thread that surfaced edge cases the article did not cover. 11% paired an editorial primary with an official/documentation support — a tutorial cited alongside the vendor docs page that defined the API or product feature the tutorial used. The remaining 4% were three-editorial cards with no support category; those were concentrated on opinion-style queries where no factual support was available to cite, and the pattern there is closer to traditional organic ranking than to ensemble citation.

Inside the dominant data-support population, the cluster was much tighter than we expected. The same 18 data domains — a mix of industry analysts (Gartner, Forrester, Statista), open-data publishers (OECD, World Bank, government statistics offices), and vertical-specific research outlets (HubSpot Research, Cloudflare Radar, the Stack Overflow developer survey) — appeared in 71% of the data-support cards in our cohort. The remaining 29% drew from a long tail of smaller research publishers, but no individual long-tail domain appeared in more than 0.8% of cards. The practical consequence is that "being cited in an AI Overview" usually means "being cited alongside one of 18 data domains" — and if your editorial page does not align with any of those 18 sources, you are competing in a smaller part of the citation space than you might think.

Driver one: source-type complementarity drives the ensemble

The single strongest predictor of whether a card showed multi-type co-citation was query complexity rather than query volume. Queries that decomposed into "definition + quantification" or "instruction + edge-case" naturally invited a two-type ensemble, and the composer reliably built one. Simple definitional queries (what is X) usually ran on a single citation or a same-type ensemble; complex composite queries (how do I do X, and how big a problem is X) ran on the mixed-type ensemble. The implication is that page-level editorial decisions should be made with the composite-query shape in mind: a how-to article that also includes a quantification paragraph fills two slots in a single page — primary editorial plus partial data support — and is materially harder to displace than a how-to article that is only the how-to.

We ran a structural test on 14 pages across 6 clients. The pages were established how-to articles that consistently took the primary editorial slot on their target queries, but were always paired with a third-party data support (one of the 18 dominant data domains) for the supporting chip. We rewrote each page to add a 100–200-word quantification paragraph using publicly-available original data — drawn either from the client's own analytics or from a public data source the client could cite directly. Over the 60 days after the rewrite, 9 of the 14 pages began co-occurring in cards where the data-support slot was filled by the rewritten paragraph rather than by the third-party data domain — effectively absorbing the support slot into the same page. Citation count on those 9 pages was unchanged, but the cards now cited only one source from the client domain rather than the previous one-plus-one pairing. The implied competitive effect is that the third-party data domain lost a citation slot to the client page.

Driver two: the data-support oligopoly is the leverage point for new entrants

The 71% concentration of data-support citations on 18 domains creates an unusual editorial opportunity. For sites that publish original-data research — pricing studies, benchmark reports, industry surveys, log analyses — the citation economics of the support slot are favourable in ways that are not obvious from the primary-slot competition. A primary editorial slot is competed by hundreds of well-written how-to articles per query; a data support slot on the same query is competed by, in practice, the 18 oligopoly domains plus a long tail of less-cited research publishers. Breaking into the support slot for a query the oligopoly is dominant on is structurally easier than breaking into the primary slot for the same query, because the population of credible data sources is smaller than the population of credible editorial sources.

The editorial brief for a data-support page is also different. The primary editorial slot rewards narrative clarity and answer-bearing lead paragraphs; the data-support slot rewards specific numbers in a quotable single-sentence form, with a clear methodology paragraph immediately adjacent. We audited the actual extracted paragraphs from the 18 oligopoly data domains and found a consistent micro-structure: a one-sentence headline statistic in a heading or pull-quote, a two-to-three-sentence methodology paragraph immediately below, and a paragraph-length context discussion. Pages that mimicked this micro-structure on a topic the oligopoly had not yet saturated took data-support citations at materially higher rates than pages that buried the statistic in a longer narrative paragraph. Three clients in the audited cohort built first data-support pages on this template during the audit window; all three started taking data-support citations within 30 days, on queries where their previous editorial pages had failed to compete at all.

Driver three: community-support pairings reward editorial pages, not forum posts

The 21% community-support population is the one where the editorial implication is least obvious. The community-support slot is filled by Reddit threads, Stack Exchange answers, Quora answers and vertical forums — sources that the client cannot directly publish to without violating community norms and that the client cannot control the editorial voice on. The temptation is to read the community-support slot as a lost slot — not competable by editorial publishing — and walk away. Reading the cards more carefully shows the opposite: the community-support slot is the slot the editorial page can sometimes absorb by changing how it handles edge cases. Pages that explicitly named and addressed the edge cases that the paired Reddit thread surfaced started co-occurring with the Reddit thread less often and eventually displaced it on some queries — because the composer no longer needed the community source to add edge-case coverage.

We ran an opportunistic test on 7 client pages where the community-support pair was a Reddit thread we could read. We extracted the top three edge cases discussed in each Reddit thread and added a 50–80-word section to the client page explicitly naming each edge case and describing the workaround. We did not touch the lead paragraph, the schema, or the page structure. Over the 45 days after publication, 4 of the 7 pages began appearing in cards where the Reddit thread had previously been the community support — sometimes alongside the Reddit thread, sometimes displacing it entirely. Two of the 7 pages saw no change; one page saw the Reddit thread's citation strengthen rather than weaken, which we suspect is because the Reddit thread itself was updated during our test window. The pattern is not a guaranteed win, but the editorial work is cheap and the upside is real on the queries where it lands.

What changed in our content checklist

Three changes. We added a "co-citation map" to every editorial brief: before writing a new page, the brief now lists the typical co-cited domains on the target queries — by running the queries and capturing the current ensemble — and identifies which slot the new page should be designed to compete for. A primary editorial brief is structured differently from a data-support brief or an edge-case absorption brief; assigning the slot upfront prevents the page from being structurally mismatched to the citation opportunity it is meant to capture. We added a "quantification paragraph" requirement to all how-to and tutorial briefs: even when the primary editorial slot is the target, the page must include a single-sentence quantification statistic with a brief methodology paragraph, both to claim partial data-support coverage and to defend against displacement by a competitor whose page does include the statistic. And we changed our reporting: per-query co-citation maps now appear in client reports alongside per-URL citation counts, and the report flags any query where the client is being cited alongside a third-party domain on a slot that the client could plausibly absorb with editorial work.

We dropped one habit. Through 2025 we had been treating "the citation list" as a competitive ranking where the editorial response was to write a better single page than every other potential source on the query. The audit shows the citation list is an ensemble, and the editorial response on most queries is to write the page that complements the ensemble rather than to compete with every member of it. For most clients on most queries, the right page to write is not "the best how-to article on the topic" — it is "the how-to article that also fills the data-support slot," or "the how-to article that absorbs the community-support edge cases," and those are different briefs that produce different pages.

01Map co-citation, not just citation. Across 6,180 multi-chip cards, 96% had a mixed-type ensemble — citation list is structurally a 1+1 or 1+2 pairing, not a 4-way head-to-head ranking.
02Add a quantification paragraph to every how-to and tutorial page. 9 of 14 audited pages absorbed the data-support slot from a third-party domain within 60 days of adding the paragraph, without losing the primary editorial slot.
03Recognise the data-support oligopoly as a leverage point, not a moat. 71% of data-support citations land on 18 domains; the population of credible data sources is much smaller than the population of credible editorial sources, and breaking into the support slot is structurally easier than breaking into the primary slot.
04For community-support pairings, audit the paired Reddit/Stack Exchange thread and absorb its edge cases into the editorial page. 4 of 7 audited pages displaced or co-cited with the community source within 45 days of adding a 50–80-word edge-case section.

Where this argument breaks

For single-citation cards the co-occurrence framing does not apply — about 19% of AI Overview cards on our query basket cited only one source, and on those queries the ensemble analysis collapses to a traditional one-winner ranking. For commercial-intent queries (product comparison, "best X for Y" queries) the ensemble pattern shifts toward editorial-plus-commerce pairings rather than editorial-plus-data; the support oligopoly there is the major marketplace listings and review aggregators rather than the 18 data domains. For Chinese-language AI search, 文心一言 and 元宝 build smaller ensembles — usually 1–2 citations rather than 3–4 — and the source-type pairing logic is weaker; the oligopoly there is the major Chinese-language portals (百度知道, 知乎, 公众号) rather than international data publishers, and the editorial transfer of the lessons here is partial. For very new client sites (under 6 months indexed) the data-support strategy works on a longer timeline because the composer takes longer to surface the page as a candidate; the editorial work pays back, but the payback window stretches from 30 to 90 days. Our window was 60 days and the cohort was 26 sites; the per-vertical numbers should be read as point estimates. Outside those carve-outs the lesson holds: in 2026 the AI Overview citation list is structurally an ensemble, and editorial work that ignores the ensemble structure is competing for the wrong slot on most queries.

Further reading