This post was sponsored by Siteimprove. The opinions expressed in this article are the sponsor’s own.
Why does my content get crawled but never cited in ChatGPT or Perplexity?
How do I tell if my AI visibility problem is technical or content-quality related?
What actually decides whether AI picks my page over a competitor’s?
The gap between appearing in an AI answer and being retrieved by an AI system is where the actual AI search strategy lives.
This article breaks down that AI search strategy process:
- How AI search systems retrieve and select content.
- Why eligibility alone doesn’t win.
- How to diagnose whether your content is failing at the retrieval layer or the quality layer.
The fix is different for each, and most teams are solving the wrong problem.
How AI Search Crawls Your Site & What Just Changed
AI search systems still rely on crawlers. If your pages block crawl access, depend on unexecuted JavaScript rendering, or bury content behind authentication walls, nothing downstream matters.
Semantic HTML, proper heading hierarchy, and descriptive markup remain the cost of entry. But the stakes are higher now: these aren’t just accessibility compliance items anymore. They’re the structural signals AI systems use to parse and chunk your content for retrieval.
Platforms like Siteimprove.ai that audit accessibility and content quality natively can surface these issues before they become retrieval problems. If you’re already running accessibility audits, you’re closer to AI search readiness than you might think.
What has changed is what happens after the system accesses your content.
Why You’re Now Competing Paragraph-by-Paragraph, Not Page-by-Page
AI systems don’t ingest a page as a single unit. They break it into passages: discrete chunks of text that get indexed independently.
This is where most traditional SEO thinking falls short. You’re no longer competing at the page level. You’re competing at the passage level.
A 3,000-word guide might contain 15 to 20 individually indexed passages. Some of those will be clear, self-contained, and directly responsive to a query. Others will be vague transitions or filler paragraphs that contribute nothing to retrieval.
Every passage is either a retrieval candidate or a wasted one. A page can rank well in traditional search while performing poorly in AI search, because its best passages are buried inside paragraphs the system can’t cleanly extract.
How to audit passages manually:
- Copy one important page into a plain document. Break it into individual paragraphs or short sections, then read each passage on its own without the surrounding page context.
- Ask one question per passage. For each paragraph, write the query it actually answers. If you cannot name a clear query, that passage probably is not strong retrieval material.
- Rewrite weak passages to stand alone. Lead with the answer, add specific context, and remove vague transitions that only make sense when someone reads the full page from top to bottom.
How AI Picks Which Passages Make It Into an Answer
When a user asks an AI system a question, the system doesn’t read the web in real time. It queries a pre-built index, retrieves the most relevant passages from potentially millions of candidates, and scores them for relevance and quality.
But the system rarely stops at the literal query. It expands the question into a network of related sub-questions (follow-ups, edge cases, adjacent concerns) and retrieves passages for each. This is query fan-out, and it fundamentally changes what “ranking” means.
Your content isn’t just competing against pages that target your exact keyword. It’s competing against everything the system retrieves across that entire network of related queries.
A page that answers one narrow question well might get retrieved for that specific sub-query. But a page that anticipates the follow-ups, the “what about” variations, and the context a user would need next gets retrieved across multiple nodes in the fan-out. That’s a fundamentally different kind of competitive advantage.
Citation happens after all of this. The system attributes its synthesized answer to the sources that contributed the most useful material. Chasing citations without understanding retrieval is working backwards.
How to map a simulated query fan-out manually:
- Start with one target question. Write down the main query your audience would ask, then list the follow-up questions they would naturally ask next.
- Group those questions by intent. Separate beginner questions, implementation questions, comparison questions, edge cases, and decision-making questions.
- Match each question to existing content. If a question does not map to a clear passage on your site, that is a retrieval gap. If it maps to a vague or buried passage, that is a passage-quality gap.
Why Being Indexed Doesn’t Mean You’ll Get Cited
Here’s where most AI visibility strategies stall.
Teams invest heavily in technical optimization (fixing crawl issues, improving page speed, adding structured data) and assume the rest will follow. They treat retrieval readiness as the destination instead of the starting line.
Being indexed by an AI system means your content can be retrieved. It doesn’t mean it will be.
Consider a practical example. Two sites publish guides on international SEO for e-commerce. Site A has strong domain authority, clean technical SEO, and a 4,000-word guide that covers the topic broadly but generically. Site B is a smaller consultancy with a 1,500-word page focused specifically on hreflang implementation for Shopify stores with three or more language variants.
When an AI system receives a query about multilingual e-commerce SEO, it fans out into sub-questions. For the specific sub-query about hreflang configuration on Shopify, Site B’s focused passage gets retrieved and cited. Site A’s guide technically covers hreflang, but its relevant passage is buried in paragraph 37 of a general overview, sandwiched between topics that dilute its signal.
Site A is retrieval-ready. Site B is answer-worthy. That distinction is the core tension of AI search optimization, and it requires a completely different audit than most teams are running.
How to test this manually:
- Run the same query across multiple AI search experiences. Use a small set of high-value questions and record which sources are cited or referenced.
- Compare the cited source to your page. Do not compare the full articles. Compare the exact section or passage that appears to answer the query.
- Look for the selection difference. Ask whether the cited passage is more specific, more direct, more current, or more practical than yours. That usually reveals why it won.
The Two Signals That Decide AI Search Passage Selection
The hreflang example illustrates a broader pattern. Once your content clears the technical gates, competition shifts entirely to quality. And “quality” in AI retrieval means something more specific than most content strategies account for.
Information Gain Is A Very Important Signal
An important factor in passage selection is whether your content contributes something the system can’t assemble from other sources.
This is information gain: original data, proprietary research, first-person case studies, or novel frameworks that don’t exist elsewhere in the index. When every other passage in the candidate pool says roughly the same thing, the passage that introduces a new data point or a genuinely different perspective has a structural advantage.
Generic coverage that restates widely available information is the easiest content for an AI system to replace with any other source. Original expertise is the hardest. If your content strategy doesn’t have a plan for producing material that is uniquely yours, you’re filling the index with passages any competitor could displace.
How to identify information gain manually:
- Review the top competing pages for the same topic. Look for repeated claims, definitions, examples, and recommendations that appear across nearly every source.
- Mark anything your page says that competitors do not. This could include proprietary data, internal benchmarks, customer examples, expert commentary, original frameworks, or lessons from implementation.
- Strengthen the unique material. Move original insights higher on the page, give them clearer headings, and support them with concrete examples instead of burying them in generic explanation.
How Topic Depth Gets More of Your Pages Into the Candidate Pool
Information increases the likelihood that gain gets your best passages selected. Depth and coverage determine how many passages you have in the candidate pool to begin with.
AI systems exploring a subject pull from multiple passages across multiple pages. If your site covers a topic comprehensively, with dedicated pages for subtopics, related concepts, and adjacent questions, you create more opportunities to be retrieved across the full query fan-out.
This works at two levels. Across your site, topic clusters with focused pages for each subtopic outperform a single pillar page surrounded by thin supporting content. Within a single page, going three layers deep on a subject (the basics, the edge cases, and the practitioner-level tradeoffs) gives the system more high-quality passages to select from.
A domain with strong general authority but shallow coverage of a specific subject will lose passage-level retrieval to a smaller site that covers that subject exhaustively. AI systems evaluate authority at the topic level, not just the domain level.
How to assess topic depth manually:
- Create a simple topic map. Put your main topic in the center, then list the subtopics, adjacent questions, use cases, objections, comparisons, and technical details a buyer or practitioner would need.
- Assign each subtopic to a URL. If several important subtopics are crammed into one broad guide, they may need dedicated pages or stronger sections.
- Look for thin or missing coverage. Prioritize gaps where competitors have specific, useful content and your site only has a passing mention.
How to Diagnose Why Your Content Isn’t Getting Cited In AI Answers
When AI visibility underperforms, the instinct is to produce more content. That’s often the wrong move.
The first diagnostic question is simpler: is this a retrieval problem or a quality problem? Each has different symptoms, different causes, and different fixes.
Signs Your Content Never Reaches the AI’s Candidate Pool
If your content isn’t appearing in AI responses at all, even for queries where you have relevant, published material, the issue is upstream. The content isn’t reaching the candidate pool.
Audit for these signals:
- Crawl access restrictions or rendering failures preventing indexing.
- Missing or broken semantic structure: heading hierarchy, section markers, descriptive markup.
- Passages that are too long, too short, or too loosely structured to be extracted cleanly.
- Content buried inside tabs, accordions, or interactive elements that don’t render for crawlers.
In practice, this looks like a page that performs reasonably in traditional search but generates zero AI citations. The content might be strong. The system just can’t access or parse it at the passage level.
Retrieval failures are technical. They’re also the fastest to fix, because the content itself may already be competitive. It just needs to reach the candidate pool.
Signs You’re in the AI Search Citation Pool but Losing to Competitors
If your content is being retrieved but not selected, or selected less often than competitors for the same queries, the issue is downstream. The system can see your content. It’s choosing something else.
Audit for these signals:
- Passages that are vague, indirect, or take too long to reach the point.
- Coverage gaps where competitors address sub-questions your content ignores.
- Lack of original data, examples, or practitioner-level specificity.
- Generic treatment of a topic that other sources cover with equal or greater depth.
The telltale sign is finding competitor citations for queries your content should own. When you compare the retrieved passages side by side, the competitor’s passage answers the question more directly, with more specificity, in fewer words.
Quality failures require content investment. They can’t be solved with technical fixes alone.
Fix This First, Then Move to Quality
Start with retrieval. Technical fixes are lower effort and unlock everything downstream. A page that isn’t being crawled or chunked properly can’t benefit from content improvements at any level.
Once retrieval is confirmed, shift to passage-level quality. Identify the specific queries where competitors are winning selection, compare the actual passages head-to-head, and close the gap at the individual passage level rather than rewriting entire pages.
The highest-ROI work sits at the intersection: passages that are already being retrieved but aren’t winning selection. They’re close. They just need to be more direct, more specific, or more useful than the alternatives.
How to prioritize fixes manually:
- Create a simple two-column audit. Label each issue as either “retrieval” or “quality.” Retrieval issues include crawl blocks, broken structure, hidden content, and poor extractability. Quality issues include vague answers, missing examples, shallow coverage, and weak differentiation.
- Fix retrieval blockers first. There is no point improving a passage that systems cannot access, parse, or associate with the right topic.
- Then improve near-miss passages. Focus on pages that already rank, receive impressions, or cover the right topic but lose citations to more specific competitor content.
What to Track Instead of Citation Screenshots
If the old metrics (mention counts, citation screenshots, brand-name tracking) don’t tell the full story, what does?
Track retrieval presence separately from citation selection. Retrieval presence asks whether your content appears anywhere in the system’s candidate set for a given query cluster. Citation selection asks whether it was chosen for the final synthesized answer.
A page with high retrieval presence but low citation selection has a quality problem. A page with low retrieval presence for queries it should match has a technical problem. That distinction tells you exactly where to invest.
The challenge is that most teams piece this together across disconnected tools: one for accessibility auditing, another for content analytics, a third for search performance. By the time you’ve correlated the data, you’ve lost the thread between cause and effect.
This is where Siteimprove’s approach matters. Because accessibility auditing, content quality scoring, and search analytics live in one platform with native analytics, you can trace a retrieval failure back to its structural cause without jumping between tools or reconciling data sets. A broken heading hierarchy flagged in an accessibility audit connects directly to the search performance data showing that page’s declining AI visibility. A content quality score on a specific page maps to its passage-level competitiveness for the queries you’re targeting.
That closed loop between accessibility, content, and search performance is what turns the retrieval-vs-quality framework from a diagnostic concept into an operational workflow.
How to track AI visibility manually:
- Build a query-tracking spreadsheet. Include the query, topic cluster, your best-matching URL, whether your brand appeared, whether you were cited, which competitors appeared, and what type of issue you suspect.
- Track patterns, not one-off screenshots. AI answers can vary, so look for repeated behavior across multiple prompts, systems, and dates.
- Separate visibility from selection. A page that appears in related answers but rarely gets cited likely has a quality problem. A page that never appears for relevant prompts likely has a retrieval or coverage problem.
What It Takes to Get AI to Pick You
The question brands should be asking isn’t “Can AI find us?” It’s “Does AI find us useful?”
That shift reframes content strategy entirely — from visibility tracking to retrieval mechanics, from page-level optimization to passage-level precision, and from generic authority-building to topic-specific depth.
Three principles hold across every AI search system operating today.
First, treat technical accessibility as non-negotiable infrastructure. It doesn’t differentiate you, but its absence disqualifies you.
Second, build content for the query network, not the individual keyword. AI systems resolve clusters of related questions simultaneously. Your content architecture should map to that same structure.
Third, prioritize information gain. Original research, proprietary data, and first-person expertise are the hardest assets for an AI system to source elsewhere — and a strong signal that your content deserves selection.
The brands that win in AI search won’t be the ones that figured out how to get mentioned. They’ll be the ones whose content was too useful to leave out.
Image Credits
Featured Image: Image by Siteimprove. Used with permission.