Entity-first sentences in AI Overviews: does naming the subject up front, instead of opening with a pronoun, change whether Google lifts that sentence in 2026

**TL;DR** — Across 27 client sites through May 2026 we audited a coreference choice that lives entirely in the first word of the answer sentence: whether the passage that answers the query **names its subject explicitly** ("Schema markup is a vocabulary that…") or opens with a pronoun or bare reference that leans on the paragraph above it ("It is a vocabulary that…", "This approach lets you…"), and whether naming the entity up front changes how often the AI Overview lifts that sentence into the card. Across 7,180 cited-passage events we joined each cited sentence to whether its grammatical subject was a named entity or an unresolved reference. The headline is that entity-first framing is a real and cheap citation lever, but it is really a self-containment lever wearing a grammar costume. A sentence that named its subject explicitly was cited 2.5× more often than a matched sentence that opened with a pronoun pointing at the paragraph before it. The strongest predictor was referent resolvability — a sentence whose subject the reader could identify without the sentence above it was lifted far more than one whose "it" or "this" only resolved with the prior context attached. The second was subject-query match — naming the exact entity the query asked about beat naming a near-synonym or a category term. The third, and the warning, was naming density — stuffing the entity name into every sentence of a paragraph read as keyword repetition and was cited no more, and on 5% of pages slightly less. One change — rewriting pronoun-led answer sentences so the first one in each section named its subject — lifted cited-passage rate by 24% on the affected sites over a 30-day follow-up.

Why we ran this audit

The AI Overview composer lifts a sentence, not a paragraph. It pulls one or two sentences out of your page and drops them into a card next to sentences from three other domains, stripped of everything that surrounded them. That single mechanic is behind half of what we have measured about citation, and it has a sharp consequence nobody writes for: a sentence that opens "It reduces crawl waste by…" is fine in the flow of a paragraph, because the reader just read the sentence that said what "it" is — but lifted alone into a card, "it" points at nothing. The composer either has to drag the prior sentence along, which makes a clumsier lift, or it skips the sentence for a competitor whose equivalent sentence names its subject and stands on its own. We suspected the composer was quietly penalising pronoun-led answer sentences for exactly this reason, and rewarding the ones that named their entity, and we wanted to know whether that was real or whether the model resolves coreference well enough that it does not care.

The second motivation was a failure mode we kept seeing on pages that ranked and were relevant but never got the chip. The page would answer the query perfectly — in the second sentence of the paragraph. The first sentence set up the subject by name, and the actual answer sentence then referred back to it with "this" or "they" or "the technique". A human reads straight through; the composer, hunting for one liftable sentence, finds the answer sentence orphaned from its subject and passes. Meanwhile a thinner competitor that repeated the entity name at the start of its answer sentence got lifted. We needed to know whether the cost was the pronoun, because if it is, the fix is almost free — name the subject in the sentence you want cited, instead of leaning on the one above it.

How we ran the measurement

27 client sites — 11 SaaS, 6 publisher, 6 B2B services, 4 DTC — each with a fixed 200-query basket of its real in-market queries, weighted toward the definitional and explanatory queries ("what is X", "how does X work", "why does X happen") where the answer sentence has a clear subject the writer could name or pronoun. Twice daily through May 2026 we captured every AI Overview card, and for cards citing a client page we identified the specific lifted sentence and classified its grammatical subject: a named entity (the query subject stated explicitly), a near-name (a synonym or category term), or an unresolved reference (a pronoun or bare demonstrative whose referent lived in a prior sentence). For each cited sentence we built a matched control: a comparable sentence on a similar query whose subject sat in a different state, so the comparison was named-vs-pronoun rather than good-page-vs-bad-page. The cited cohort was 7,180 events.

Two normalisation moves matter. We scored the subject state on the sentence as it would be lifted — alone, with no prior sentence attached — because that is the unit the composer extracts, and a pronoun that resolves fine in the paragraph is unresolved in the card. And we matched on sentence citability before comparing subject state — we paired each cited sentence with a control our existing cited-paragraph rubric scored as equally liftable (concrete, right length, directly on the query), so the effect we attribute to entity-first framing is not just the named-subject pages being better written overall. The 2.5× and 2.3× figures are from those matched comparisons, not raw averages.

The shape of the entity-first pattern

The flat headline first. Sentences that name their subject are cited more. A sentence whose grammatical subject was the named entity was lifted 2.5× more often than a matched sentence that opened with a pronoun or demonstrative pointing back at the prior sentence. The effect held through the quality match and the citability control: among sentences our rubric scored as equally liftable, the ones that named their subject were lifted far more than the ones that leaned on the sentence above them. The composer behaves as though it prefers a sentence that survives extraction intact over one that arrives in the card missing the thing it is about.

The most decision-relevant cut was that this is self-containment, not grammar. We tested whether the win was specifically about naming the entity or more broadly about the sentence resolving on its own, and it was the latter: a sentence that opened with a pronoun but whose referent was unambiguous from the sentence itself ("Both formats compress the same way…", where the formats were named earlier in that same sentence) was cited nearly as well as a fully named subject, while a sentence that named its subject but still depended on the prior sentence for a critical qualifier was cited no better than a pronoun. Entity-first framing is a reliable way to make a sentence self-contained, but it is the self-containment the composer rewards. Name the subject because it is the cheapest way to make the sentence stand alone, not because the model is counting nouns.

Driver one: make the referent resolve without the sentence above

The single strongest predictor was whether the answer sentence resolved its own subject without the sentence before it. Holding the sentence constant, a version whose subject the composer could identify from the sentence alone — because the entity was named in it — was lifted at 2.5× the rate of a version that opened "It…" or "This…" or "They…" and relied on the prior sentence to say what that was. The composer extracts a sentence and reads it cold; a named subject means the cold read still answers the query, a pronoun means the cold read is about an unknown. A human reader never experiences the sentence cold, so writers never feel the cost — but the composer always reads it cold, because that is the whole mechanism of extraction.

We ran a structural test on 21 answer sentences across 11 clients, each of which opened with a pronoun or demonstrative whose referent sat in the sentence above. We rewrote the first sentence in each affected section to name its subject, changing no claims — only swapping "it" or "this technique" for the entity name. Over the 45 days that followed, 15 of the 21 sentences began being lifted on at least one target query where they had previously been skipped. The lever was not new content; it was making the answer sentence carry its own subject, so that when the composer pulled it out of the page it still knew what it was about.

Driver two: name the entity the query asked about, exactly

Holding resolvability constant, the second driver was subject-query match. A sentence naming the exact entity the query asked about — the precise term, not a synonym or a parent category — was lifted more than the same sentence naming a near-name. A page answering "what is INP" that opened its answer sentence with "Interaction to Next Paint is…" beat one that opened "This responsiveness metric is…", even though both resolve and both are accurate, because the composer matching the query entity to the sentence subject found a clean match in the first and an inferential hop in the second. The reading consistent with the data is that the composer prefers a sentence whose named subject is the query's entity verbatim, because it can verify the sentence is on-topic without resolving a synonym first.

We ran a structural test on 17 answer sentences across 9 clients that named the subject with a category term or a synonym rather than the query's exact entity. We rewrote each to lead with the precise entity name the query used, keeping the synonym later in the sentence for readability. Over the 60 days after the change, 12 of the 17 sentences improved their cited-passage rate. The two drivers compound: a sentence that names a resolvable subject but the wrong term is half-built, and one that names the right term but still leans on the prior sentence is the other half — the sentences that won named the query's exact entity and resolved on their own.

Driver three: naming density, and the entity-stuffed paragraph that backfires

The third driver was the warning. Naming the entity is a per-sentence lever, not a per-paragraph one, and pushing it past the first sentence backfires. A paragraph that repeated the entity name at the start of every sentence — "Schema markup is… Schema markup helps… Schema markup should…" — was cited no more often than one that named the subject once and used natural pronouns afterward, and on 5% of audited pages the entity-stuffed version was cited slightly less. The reading consistent with the data is that the composer needs the lifted sentence to be self-contained, which usually means the first sentence of a section, and once that sentence names its subject the rest of the paragraph carries it; repeating the name everywhere reads as keyword repetition rather than clarity, and the composer appears to discount paragraphs that pattern-match to keyword stuffing. Resolvability is the asset; density past the point of resolution is a liability.

We confirmed this on 14 paragraphs across 8 clients where an earlier optimisation pass had named the entity in nearly every sentence. We pared them back to name the subject in the lead sentence and the one or two other sentences a reader might land on cold, and let the rest use natural references. Over the following 45 days the pared-back paragraphs held or improved their citation while reading more naturally, and none lost the chip they had. The actionable rule is blunt: name the subject in the sentence you want lifted — usually the first of a section — and trust pronouns for the rest, because the gain is in self-containment, not in how many times the name appears.

What changed in our content checklist

Three changes. We added a cold-read pass: before publishing, we read each section's lead answer sentence alone, with the sentence above it covered, and if its subject is a pronoun or bare demonstrative we rewrite it to name the entity — because that is exactly how the composer reads it. We added a subject-query match check to the same pass: the named subject in the answer sentence should be the query's exact entity, not a synonym or category, with the synonym moved later in the sentence if we want it for flow. And we added a density cap: name the subject in the lead sentence and any other a reader might enter cold, then stop — repeating the entity name down the paragraph buys nothing and risks the keyword-stuffing discount.

We dropped one habit. For years our house style had favoured pronouns and demonstratives after the first mention as a marker of fluent writing — repeating a noun read as clumsy, so "the technique", "this approach", "it" took over by the second sentence, including in the sentences that answered the query. The audit removes that default for lead answer sentences: a pronoun in the one sentence the composer would lift spends self-containment for prose elegance the composer does not read. So reflexive pronoun substitution left our playbook for answer sentences — we now name the subject in the sentence built to be cited, and reserve pronouns for the sentences around it where no citation is at stake.

01Name the subject in the answer sentence. A sentence whose subject was the named entity was cited 2.5× more than a matched sentence that opened with a pronoun pointing at the paragraph above — the composer lifts a sentence that survives extraction intact.
02Make it resolve cold. The win is self-containment, not grammar; a sentence the composer can read alone and still know what it is about is lifted, a pronoun that needs the prior sentence is skipped — 15 of 21 sentences were cited after naming their subject.
03Name the query's exact entity. The precise term beat a synonym or category; 12 of 17 sentences improved after leading with the entity the query used — the composer verifies on-topic without resolving a synonym first.
04Cap the naming density. Repeating the entity name in every sentence was cited no more and hurt on 5% of pages — name the subject in the lead sentence, then trust pronouns; the gain is resolvability, not repetition.

Where this argument breaks

For navigational and brand queries the entity is usually the page itself and there is no answer sentence to reframe — someone searching your brand is not after a definitional sentence, so the lever does not apply; it is for definitional and explanatory queries where the answer sentence has a subject that could be named or pronouned. For narrative and persuasive passages — case studies, opinion, story-driven content — forcing the entity name into every lead sentence reads as wooden and is not worth the citation it may not earn; the cold-read pass is for the answer sentences, not for prose whose job is to be read in flow. For very short pages where the whole answer is one or two sentences, the subject is almost always named already and the lever has little room to act. For languages with pro-drop or different coreference norms the effect may differ — in our parallel Chinese-language audit (文心一言, 元宝, 通义) the entity-first effect was present but weaker, and the engines tolerated a dropped or implicit subject better than Google's AI Overview did, though naming the subject still helped on the hardest extractions. The 5% stuffing penalty is small and noisy; we are confident over-naming does not help and mildly confident it hurts past the resolution point, but it is the weakest finding here and we would not restructure a page on it alone. Our window was 60 days and the cohort was 27 sites; the multipliers are point estimates that will move by vertical and query type. Outside those carve-outs the lesson holds: in 2026 the AI Overview lifts a sentence that names its subject and resolves on its own far more readily than one that opens with a pronoun leaning on the sentence above it, the unit is the individual answer sentence rather than the page, and the cheapest citation win on a definitional or explanatory query is to name the query's exact entity in the one sentence you want cited — and then to stop naming it.

Further reading