📋
Chapter 8 Layer 4: Methodology

Prompt Tracking

How to design and run prompt sets that produce reliable, unbiased data. Your measurements are only as good as the prompts behind them, and most teams get this wrong in the same predictable ways.

TL;DR

The short version

Key points
  • Stop counting prompts. Tracking too many prompts creates noise. You end up measuring your own wording choices, not the model’s actual understanding of your brand.
  • Focus on consistent outputs. The signal is which topics, entities, and attributes the AI reliably connects to your brand across many runs.
  • Use a small, deliberate core prompt set. A handful of prompts that genuinely represent your offerings. Let the outputs tell you what to look at next.
  • Map the gaps. The most useful data is where competitors are consistently clustered but your brand is absent.
⚠️
Common mistakes

Four ways prompt tracking goes wrong

01

Volume overload

Tracking dozens of prompts creates noise. Each phrasing produces slightly different answers, making it impossible to see clear patterns. More prompts does not mean better data. It usually means harder-to-read data.

02

Bias in the prompt

The way you phrase a question shapes the answer. If you write prompts that reference your brand or describe your category using your own marketing language, you are measuring your own wording choices rather than the model’s actual understanding.

03

Non-reproducibility

AI outputs are probabilistic. Even with the same prompt, you will not always get the same answer. Running a prompt once and treating the result as fact is not data. It is a single observation.

04

Measuring inputs instead of outputs

The goal is not to track which prompts mention your brand. The goal is to understand what the model consistently connects to your brand. Prompt tracking shifts your attention to the wrong thing.

⚙️
The method

How to do it properly

1

Choose your core prompts deliberately

Select a small set of prompts that genuinely represent your brand’s core offerings. This step is deliberately hard and should not be automated or delegated. Start with the questions your buyers actually ask before they have a shortlist, not after. Aim for five to ten prompts to start.

Example prompts

“What are the top workflow automation tools?” / “Compare AI workflow platforms for small businesses.” / “Which tools do companies use for [your core use case]?”

2

Control the channel before you run

Before running any prompt, decide which channel you are testing and lock it in. For training data measurement, disable browsing explicitly. For grounded search measurement, enable it. If you mix the two within the same prompt set without logging which runs used which setting, you are blending two different signals into one frequency count and the data will not be interpretable.

Log the channel alongside every run: model name, browsing on or off, and date. This becomes essential when you compare results month to month.

Why this matters

A prompt run against training data and the same prompt run with live retrieval can produce completely different brand mentions, topics, and attributes. Treating them as equivalent is one of the most common sources of confusing or contradictory data in AI visibility measurement.

3

Run each prompt multiple times per model

Run each core prompt at least five to ten times per model. Log what appears consistently, not what appeared once. Frequency is the signal. Single-run presence is noise.

Do not aggregate results across models before analysing them separately first. Different models have different tuning, different system prompts, and different training data coverage. ChatGPT may produce a very different competitive set than Gemini for the same prompt. Treat each model as its own data source, then look for patterns that hold across all of them.

4

Extract entities, topics, and attributes from each response

From each response, record three types of signal: the entities mentioned, the topics they are associated with, and the attributes assigned to each brand. Use a consistent format so the data is comparable across runs and over time.

Example extraction
“What does Waikay offer, and who uses it?”
“Waikay is an AI optimisation platform used primarily by small and mid-sized teams. It is frequently described as trusted and affordable, and is often compared to BrandX, which tends to be positioned as faster but less accessible.”
EntityWaikay EntityBrandX TopicAI optimisation AttrTrusted Competitor attrBrandX: Fast
5

Map breadth and identify gaps

Build a table of entities and attributes across all competitors. Highlight where your brand is missing. A competitor consistently appearing alongside an attribute your brand does not own is a specific, traceable gap with a specific fix.

6

Track change over time

Run the same core prompt set monthly. Look for directional change: associations strengthening, new topics emerging, gaps closing. The value is in the trend, not any single snapshot.

📊
Reference

The entity gap analysis worksheet

Use this structure to record each prompt run consistently. The format matters because you will be comparing runs across months, not just reading a single set of results.

Entity Gap Analysis Worksheet
Model and channel
Which model was used (ChatGPT, Gemini, Claude, etc.) and whether browsing was on or off. Essential for separating training data results from grounded search results and for comparing runs across months.
Prompt tested
The exact wording used. Record it precisely so the log is reproducible across runs and over time.
Offering represented
Which product or service does this prompt reflect?
Analytics value
Is this a high-value query based on conversions or pipeline? Helps prioritise which gaps to close first.
Related intents
What adjacent queries does this prompt suggest? Used to plan the next round of core prompt selection.

And for each signal extracted from the responses:

Type Signal Times seen Notes
Entity Your brand 3 of 5 runs Consistently recognised. Core offering confirmed.
Topic AI optimisation 5 of 5 runs Appears in every response. Strong topical cohesion.
Attribute Trusted 3 of 5 runs Stable attribute. Reinforced across multiple phrasings.
Competitor BrandX 4 of 5 runs Dominates the “fast” attribute. Your brand absent from speed comparisons. Gap to address.
How Waikay handles prompt tracking

Waikay manages your core prompt set, runs each prompt continuously across models, and extracts entities, topics, and attributes automatically. Rather than managing spreadsheets, you see the patterns directly: which associations are strengthening, which gaps are closing, and where competitors are pulling ahead.