🔗
Chapter 6 Layer 3: Influence

Citation Data

Which sources is AI drawing on when it talks about your brand? Where are your competitors getting cited that you are not? Citation data tells you why the model says what it says, and where you need to be to change it.

TL;DR

The short version

Key points
  • Citations in AI responses are modelled, not retrieved. LLMs approximate what should be cited based on training patterns. Some are accurate, some are hallucinated.
  • Run two types of query. Commercial queries show which sources dominate live retrieval. Knowledge-based queries reveal what the model has learned about your brand specifically.
  • Build a citation profile. Track which domains, page types, and platforms appear consistently across responses.
  • Hallucinated citations are a signal. If a model cites a page on your site that does not exist, it is telling you what content it expects you to have.
📌
Before you start

Two chapters worth reading first

Citation behaviour differs significantly between training data and live retrieval. Understanding which channel you are measuring changes how you interpret what you find and what you do about it.

🔀

Ch. 7 – AI Visibility Channels

Citations work differently depending on whether the AI is drawing on training data or live search. Chapter 7 explains how each channel works and why the distinction matters for citation analysis.

📋

Ch. 8 – Prompt Tracking

Getting consistent citation data requires well-designed prompts. Chapter 8 covers how to structure queries that reliably surface citations rather than producing inconsistent results.

⚙️
How it works

How citations actually work

Modelled, not retrieved

When an AI cites a source, it is not always pulling a live URL from the web. In many cases it is generating what it predicts a citation should look like based on patterns in its training data. This is why AI citations are sometimes wrong, broken, or pointing to pages that do not exist.

Understanding this changes how you approach citation analysis. You are not just auditing what the AI links to. You are auditing what it has learned to associate with authoritative sources in your space.

How AI citations work A flowchart showing four stages: query input, live search decision branching into two paths, training data recall, and citation tagging Query input Live search decision No search Live search Training data No URLs retrieved Live results Real-time URLs Training data recall Citation tagging
Why this matters for your strategy

You can influence citations at two points: the live search surface (traditional SEO) and the training data layer (authoritative content on the right platforms). Most teams focus only on the first. The second is often more important for brand-specific queries.

🎯
Where to act

Where you can actually influence citations

Pipeline stage Leverage What you can do
Query interpretation Limited Model likely intent and design content around the questions buyers actually ask.
Live search decision None A system-level call made by the model. Cannot be triggered or influenced externally.
Live search surface Yes Traditional SEO applies: ranking well, earning backlinks, structuring metadata clearly.
Training data recall Yes Publish high-quality, well-linked content on authoritative platforms that the model has learned to trust.
Citation tagging Yes Track patterns, fix broken links, use clean URL structures to reduce hallucinated citations.
📋
How to do it

Building your citation profile

Two types of query, two types of insight

To build a complete picture of your citation profile you need to run two different kinds of query. Each surfaces different information about how and where you are being cited.

Query type 01

Commercial queries

General category-level questions that do not mention your brand. Reflect how a buyer researches before knowing which brands to consider. Shows which sources dominate live retrieval in your space.

“What are the best resources on AI optimisation?”
Query type 02

Knowledge-based queries

Entity-specific questions that mention your brand directly. Tap into trained memory rather than live retrieval. Ask the model to turn off live search for a cleaner read of what it has actually learned.

“What does [Your Brand] offer in the AI optimisation space?”
1

Run queries and collect citations

Run both query types across one or more models. Explicitly ask for citations in your prompt. Record every URL returned, including broken ones and hallucinated ones. Both matter.

2

Log, aggregate, and compare against competitors

Record each citation: URL, domain, page type, which query it appeared in, and whether the link actually resolves. Then run the same queries for your main competitors. Platforms appearing consistently for them but not for you are the primary targets for investment.

3

Treat hallucinated citations as content briefs

If the model cites a URL on your domain that does not exist, it is telling you what content it expects your brand to have. That is a content brief. Create the page.

4

Monitor on a regular cadence

Citation behaviour shifts as models update. Re-run quarterly at minimum and track the direction of change over time rather than treating any single snapshot as definitive.

Improving citations

Making your citations more accurate and more frequent

01

Clean URL structures

LLMs hallucinate links based on domain patterns. Predictable, descriptive URL structures narrow the gap between what the model approximates and what actually exists on your site.

02

Citation-friendly content formats

Guides, glossaries, and structured explainers are cited more often than marketing pages. Include clear author attribution, structured headings, and internal linking that helps the model understand context.

03

Authoritative platform presence

Publish on platforms with strong representation in training data: industry publications, structured knowledge bases, well-linked community sites. The model cites what it has been trained to treat as authoritative.

04

Fix broken links promptly

A broken page that keeps being cited is a missed attribution every time. Redirect it or restore it. Every broken link is a reference that leads nowhere.

How Waikay tracks Citation Data

Waikay collects citations across every tracked prompt, identifies which domains and page types appear most consistently, and surfaces the gap between where you are cited and where your competitors are. It also flags hallucinated citations so you can turn them into real content rather than letting them remain as dead references.