🔤
Chapter 9 Layer 4: Methodology

NLP and Entity Analysis

The linguistic layer behind every metric in this guide. AI does not index keywords. It builds a semantic map of concepts, brands, and relationships from co-occurrence patterns in text. This chapter explains how that map works, how to read your position in it, and how to track whether that position is shifting.

TL;DR

The short version

Key points
  • AI builds a semantic map, not a keyword index. Your brand is a node connected to topics, competitors, and attributes through co-occurrence patterns in training data.
  • Three signals define your position: entities (what the model recognises), topics (what clusters you are pulled into), and attributes (how the model describes you).
  • Strong clustering means consistency. The same topics and attributes appearing reliably across models, prompt types, and over time.
  • Run two types of test, separately. Training view (browsing off) shows what the model has learned. Grounded view (browsing on) shows what it finds now. Keep them separate before comparing.
  • Track drift over time. Associations that are strengthening, fading, or emerging are the early signals of what your AI brand position will look like in six months.
🗺️
The concept

How the semantic map works

Co-occurrence is everything

Large language models do not store facts like a database. They build dense numerical representations of meaning, called embeddings, by learning from billions of examples of how words and concepts appear together in text. Concepts that co-occur frequently end up with representations that the model treats as related. This is a simplification of what is technically a much more complex process involving attention mechanisms and contextual representations, but it holds as a working model for content strategy purposes.

Think of your brand as having a position in the model’s learned understanding of language. The topics, brands, and descriptors that regularly appear alongside your brand name in training text shape that position. The ones that rarely or never co-occur have little influence on it. This is why consistent, topic-specific publishing builds associations and why a single page about a topic is rarely enough. It is also why the framing below of a “semantic map” is an analogy, not a literal description of how the model stores information.

The practical implication is that you cannot directly edit what the model has learned. You can only influence it by changing what appears in the text it learns from. More co-occurrence with a topic, across more sources, over a longer period, shifts how the model relates your brand to that topic.

Why this matters for the rest of the guide

Every metric in this guide is a downstream reading of how the model has learned to relate your brand to the world. Share of Voice measures how often your brand surfaces in that learned understanding. Topical Presence measures which concept clusters it is connected to. Citation Data measures which sources shaped those connections. Entity Analysis is the method that lets you read those connections directly, even if imprecisely.

📡
The three signals

Entities, topics, and attributes

Every piece of data you extract from an AI response falls into one of three signal types. Understanding these is the foundation of every analysis in this chapter.

🏷️

Entities

Brand names, products, people, organisations. The anchors of meaning in the model’s map. When the model recognises your brand as a distinct entity, it can attach other signals to it consistently.

Entity: “Waikay” recognised as a brand node, linked to AI optimisation
🔗

Topics

Clusters of co-occurring concepts. If your brand is repeatedly mentioned in the context of a topic cluster, it gets pulled into that cluster. Consistent co-occurrence across many sources is what builds the association.

Topic cluster: “AI optimisation” + “workflow automation” + “small business tools”
🎨

Attributes

Descriptive modifiers weighted by frequency and consistency. If your brand is repeatedly described as “affordable” or “enterprise-grade,” those traits become part of its semantic identity and are reproduced in responses.

Attribute: “trusted” appears in 4 of 5 runs — stable identity marker
How these signals relate to each other

An entity is the anchor. Topics are what the model connects the entity to. Attributes are how the model characterises it within those connections. A brand with strong entity recognition, broad topic connections, and consistent attributes is deeply embedded in the model’s understanding of its market. A brand missing any of these is vulnerable.

🧲
Clustering

Strong clustering vs weak clustering

Clustering refers to how tightly your brand is grouped with the topics and attributes that define your market. Strong clustering means the model is confident about what you are and where you belong. Weak clustering means it is uncertain. Uncertain brands are not the ones that get recommended.

Dimension Strong clustering Weak clustering
Entity recognition Brand consistently recognised as the same entity across contexts and models Brand name is ambiguous, inconsistently tagged, or confused with other entities
Topical cohesion Mentions repeatedly appear alongside the same core topics Mentions scattered across unrelated domains with no clear pattern
Attribute consistency The same attributes appear reliably across models and prompt types Attributes vary or contradict each other across runs
Competitive framing Brand is clearly differentiated from competitors within the cluster Competitors dominate the key attributes, brand appears peripheral
Visibility outcome Brand surfaces as a default mention when the topic arises Brand rarely mentioned or omitted entirely from category responses
⚙️
How to do it

Running the analysis

1

Training view: what the model has learned

Run prompts with browsing explicitly disabled. Use direct entity prompts: “What is [Brand]?”, “What does [Brand] offer?”, “Who uses [Brand]?”. This is your baseline: what the model knows from its training window, stable and slow to change.

Run each prompt five to ten times per model. Log only what appears consistently across runs, not single-run mentions.

Which models to use

Run across at least ChatGPT and one other model (Claude or Gemini). Keep results separate per model before comparing. Different models have different training data coverage and will produce different entity maps for the same brand. The overlap between them is your most reliable signal.

2

Grounded view: what the model finds now

Run the same prompts with browsing enabled, or use a grounded-by-default platform like Perplexity. Compare what appears here against your training view. Topics or attributes that appear in the grounded view but not the training view are worth flagging as candidates for emerging associations, though treat this carefully: the difference could mean new content is entering the retrieval pool, but it could also mean the query phrasing happened to trigger retrieval rather than recall, or that training coverage on your brand is patchy in that area. Consistently appearing across multiple grounded runs is a more reliable signal than appearing once.

These grounded-only signals are worth tracking as potential leading indicators of where trained associations may eventually move, particularly if they strengthen over successive months.

3

Extract and log entities, topics, and attributes

From each response, pull out all three signal types and tag them consistently. The tagging format matters: you will be comparing across runs and months, so the labels need to be stable.

Example extraction
“What does Waikay offer, and who uses it?”
“Waikay is an AI optimisation platform used primarily by small and mid-sized teams. It is frequently described as trusted and affordable, and is often compared to BrandX, which tends to be positioned as faster but less accessible.”
EntityWaikay EntityBrandX TopicAI optimisation AttrTrusted AttrAffordable Competitor attrBrandX: Fast
4

Run comparison prompts

Prompts like “Compare [Brand] vs [Competitor]” force the model to articulate how it differentiates each brand. This surfaces competitive framing directly and often reveals attributes you did not know the model was using to separate you from rivals.

Note on comparison prompts and grounded search

As covered in Chapter 7, comparison prompts increasingly trigger grounded search in ChatGPT as of 2024 and 2025. Run these with browsing both on and off and log which setting produced which response. The training data version is your long-term baseline. The grounded version shows what buyers are seeing in near real time.

📈
Tracking over time

Semantic drift tracking

The direction of change matters as much as the current state

A single entity analysis gives you a snapshot. Running the same analysis every month gives you a trend. The trend is what tells you whether your content and citation efforts are working, and whether associations you depend on are being eroded by competitor activity or model updates.

Use four strength levels to classify each signal across runs:

Strong 4 or more of 5 runs. Core identity marker.
Weak 1 or 2 of 5 runs. Not yet established.
Emerging Appears in grounded view but not training view.
Absent Not appearing in any run for this period.
Signal Type Nov 2025 Jan 2026 Mar 2026 Direction
Trusted Attribute Strong Strong Strong Stable
Affordable Attribute Weak Emerging Strong Strengthening
Enterprise-grade Attribute Strong Weak Weak Fading
AI ethics Topic Absent Absent Emerging New — investigate
What to do when you see fading or unexpected emerging signals

A fading attribute is worth investigating before it disappears entirely. Check whether your content still supports it clearly, whether competitor content is displacing it, or whether a model update has reweighted the associations. An unexpected emerging topic, like “AI ethics” above, is either an opportunity to own or a risk to understand. Do not ignore either.

➡️
Next steps

What to do with the analysis

01

Weak or absent entity recognition

The model does not consistently recognise your brand as a distinct entity. This is the most fundamental problem and affects everything else. Fix: publish consistently branded content across authoritative domains using exact, canonical brand name phrasing. See Chapter 4 (Factual Accuracy Rate) for how to audit and correct misrepresentations.

02

Missing topic associations

The model does not connect you to topics you should own. Fix: build topical clusters around those topics — multiple pieces of content, consistent entity references, internal linking, third-party coverage. One page is rarely enough. See Chapter 3 (AI Topical Presence) for how to score and prioritise gaps.

03

Inconsistent or wrong attributes

The model describes you differently across runs or assigns attributes you do not want. Fix: establish canonical messaging on your own properties and repeat it clearly. Identify which sources may be introducing conflicting descriptions and address them. See Chapter 4 for the audit process.

04

Competitor dominating an attribute you need

A competitor is consistently associated with an attribute that should also belong to you. Fix: publish content that explicitly makes the connection between your brand and that attribute, backed by evidence. Co-occurrence volume over time is the mechanism. See Chapter 1 (AI Competitive Map) to understand the full competitive picture first.

How Waikay tracks entity and topic signals

Waikay extracts entities, topics, and attributes from every tracked prompt response and tracks their strength over time. The drift table above is built automatically from your prompt data, so you can see which associations are strengthening and which are fading without building or maintaining spreadsheets manually.