🗂️

Chapter 5 Layer 3: Influence

Entity Map

Every chapter so far has been about reading the situation. This chapter is about changing it. EntityMap is the first lever in this guide that operates directly on the systems producing responses about your brand. Not by publishing more content. By giving retrieval systems a declared map of who you are, what you claim, and how your concepts connect. It acts before retrieval starts.

← Previous chapter Ch. 4 – Factual Accuracy Rate Next chapter Ch. 6 – Citation Data →

⚡

TL;DR

The short version

Key points

Your content is being used without your brand being credited. When systems fetch your pages, strip the HTML, and break text into fragments, publisher identity does not reliably survive. This is a structural problem, not a policy one.
EntityMap is an open standard that fixes this at the source. A machine-readable file at your domain root that declares your entities, your evidence, and the relationships between your concepts in a form any retrieval system can read directly.
The hallucination connection is direct. When a system works from declared relationships rather than inferred ones, it has less room to fill gaps with invented connections. The graph constrains what the system can conclude.
It works on live retrieval, not on training data. For search engines and tools fetching your content right now, EntityMap improves attribution and reasoning quality immediately. It cannot fix associations already baked into model weights.
There is real evidence it works. Installed on waikay.io on one date, a single EntityMap file reversed a three-month score decline within 48 hours and was cited 2 to 3 times more often than the site’s About page by Gemini and Sonar.

📍

The problem

Why more content is not always the answer

By the time you reach this chapter you have a Share of Voice score, a Topical Presence map, and a Factual Accuracy picture. You know how often your brand appears, what it is associated with, and whether what gets said is accurate. The natural question is: now what?

The standard answer is content. Publish more, cover more topics, get more citations. That is not wrong. But it misses a layer underneath. The reason a retrieval system misrepresents your brand is often not that the right content does not exist. It is that the content exists but is not legible to the system retrieving it.

Here is what actually happens when a retrieval system fetches a page from your site. It strips the HTML. It splits the remaining text into fragments of a few hundred characters each. It passes those fragments into its model with no preserved information about who published them, what concept they are meant to support, or how that concept relates to others on your site. Five years of thought leadership becomes a pile of decontextualised passages. The model reassembles meaning from those passages as best it can.

EntityMap addresses that structural problem. It sits at a predictable URL on your domain as a machine-readable index: your concepts, your definitions, your evidence, and your reasoning graph. The system still reads your pages. It just starts from a much better position.

The sitemap parallel

In 2005, search engines were crawling billions of pages but struggling to understand which ones mattered and how often they changed. The sitemap.xml convention solved the discovery problem. Within two years every major CMS generated one automatically. EntityMap is the same category of intervention, applied to meaning rather than discovery: a simple declarative file that solves a structural problem the existing format cannot address.

🔴

Three failures

What breaks when retrieval has no structure to work from

Fragment-based retrieval creates three specific problems. Each one requires a different part of EntityMap to address it. Understanding them separately matters because they produce different kinds of errors in the output and call for different fixes in the file.

Disambiguation fails, and your expertise fragments

Your site might use “AI SOV” in navigation, “AI Share of Voice” in body copy, and “artificial intelligence share of voice” in a technical glossary. To a system doing fragment-level retrieval, these are three distinct text signals with no declared relationship. There is no mechanism telling it these surface forms all refer to the same concept, the one you have spent years building authority around.

The consequence is dilution. Instead of a concentrated signal on one concept, you have partial signals scattered across fragments. Competitors who write about the same topic with more consistent language will often surface more clearly, not because their thinking is better but because their text is less varied.

EntityMap addresses this with canonical entity IDs and the alternateName field. All variants resolve to one entity. Every evidence fragment that supports the concept references that entity’s ID. The system starts from the canonical form, not from whichever fragment it happened to retrieve first.

Attribution fails, and you become a ghost citation

A retrieval system fetches content from your site, synthesises an answer, and surfaces a URL as a footnote. Your company name does not appear in the answer. A reader who acts on that information has no idea the expertise came from you. This is directly connected to what you measured in Chapter 4.

This is not the system ignoring attribution. Publisher identity is simply not embedded in page-level content in a way that reliably survives the fragmentation process. When a passage is lifted from a page and stripped of HTML, your brand name travels with it only if it appears in that specific fragment with enough frequency to survive dilution. For most body copy, it does not.

EntityMap addresses this with an explicit publisher field on every evidence chunk. That field is designed to carry your brand name through downstream processing. It must match the publisher name in the root of the file exactly, and that consistency is what makes it reliable. You are not hoping the system infers your identity from your URL. You are declaring it on every piece of evidence you publish.

Ghost citations are so common because this is a structural gap, not a policy one. EntityMap is a structural fix.

Reasoning fails, and hallucinations fill the gap

When a system receives a question that requires connecting multiple concepts, it has to reconstruct the logical chain from unstructured text. Sometimes it gets this right. Often it partially misses it, or invents a plausible-sounding connection that is not what you actually claim.

Consider a question that requires linking a data problem to a regulatory risk through a specific product feature. If the relationships between those concepts live only in prose, scattered across pages, the system has to infer the chain. It signals that inference with hedging language: “likely”, “probably”, “may assist in”. When the inference is wrong the hedging disappears and the wrong claim looks confident. That is where hallucinations come from: not random invention, but confident gap-filling.

EntityMap’s relations layer addresses this directly. You declare the predicates explicitly: this concept CONFLICTS_WITH that one, this product IMPROVES that outcome. The chain is read from your own declarations, not inferred from prose. The system is not filling a gap. It is traversing a graph you built. Because the graph constrains what the system can conclude, the space for hallucination narrows.

This is why the relations layer is not optional. It is the mechanism by which your reasoning, not just your content, enters the retrieval layer.

What EntityMap cannot fix

If a model has already been trained on data that associated your brand with wrong claims, EntityMap does not change that. Model weights are fixed after training. The only way to correct a training-time association is retraining, which is outside any publisher’s control. EntityMap fixes the retrieval half of the problem: what happens when systems fetch your content right now. It does not fix the training half: what the model learned months ago. Both matter. Only one is addressable today.

📊

Evidence

What happened when we installed one on waikay.io

On April 25, 2026, we installed an EntityMap on waikay.io. It was the only change we made during the measurement window. We tracked AI Visibility Scores across five topics for the five weeks that followed. The score measures how closely what a retrieval system says about a brand matches what that brand actually publishes: 0 means no overlap, 100 means perfect alignment.

The results were not uniform. That is important. Uniform results would suggest measurement error. What happened instead was the expected pattern: scores already near their ceiling barely moved. Scores on topics where retrieval systems had been pulling weaker or fragmented sources improved significantly and quickly.

+26 pts

AI hallucinations topic, in 48 hours

The score had been falling by roughly 10 points a month for three consecutive months. Retrieval systems were increasingly drawing on weaker sources when answering questions about Waikay and hallucinations. Forty-eight hours after the EntityMap went live, the entire decline reversed. Score moved from 58 to 84.

New content takes weeks to index. Backlinks accumulate over months. A 48-hour reversal points directly at live retrieval as the mechanism, which is exactly what EntityMap is designed to affect.

+10 pts

AI search optimisation topic, in 10 days

Three readings flat in the low 80s, drifting slightly downward. The trend predicted the next reading would land near 80. It landed at 92. No pre-existing momentum explains a 10-point jump. This is the pattern you would expect from a structural fix: a topic where fragmented signals were being synthesised into a weaker picture, corrected by giving retrieval systems a declared starting point.

+1 pts

Brand overview topic, no meaningful change

This score had been oscillating between 91 and 96 for three months before install, and between 94 and 97 after. No meaningful change. This is the expected result for a topic already near its ceiling, and it is the check that gives the other results credibility. If everything had improved uniformly, that would indicate measurement error.

The citation finding

Beyond scores, we tracked which URLs retrieval systems cited when answering brand queries. The EntityMap file, a single 30KB structured file, was cited more often than the site’s About page on both Gemini (2.2 times more) and Sonar (3.0 times more).

The About page is the URL every conventional SEO playbook says should dominate brand queries. A structured knowledge file that had existed for a few weeks outperformed it on both surfaces. The reason is direct: retrieval systems looking for structured, attributed, entity-level information find exactly that in the EntityMap. They find general prose on an About page.

Where it did not work yet

ChatGPT, Copilot, and Claude never cited the EntityMap during the measurement window. Not because they rejected the format, but because Bing had not indexed the file. Those models depend heavily on Bing-derived signals. We had not submitted via Bing Webmaster Tools, added it to our sitemap, or linked from the homepage before deploying. That is a 20-minute job we skipped. It produced an accidental control group. The discovery signals covered in the implementation section below are not optional.

💡

Key insight

A hallucinated URL is a content brief

When a retrieval system cites a URL on your site that does not exist, it is not malfunctioning. It is telling you something specific. It expected that content to be there based on the pattern of what surrounds it. The hallucinated URL is a directional signal about what the system was looking for and could not find.

That gap is worth filling. The hallucinated claim is often a reasonable rough draft of what should exist: the argument the system wanted to make, in the form it wanted to make it, pointing at a page that should exist but does not.

If you write that content, reference it in your EntityMap as a chunk on the relevant entity, and give the chunk the correct source URL, you have done three things at once. You have filled the content gap. You have given retrieval systems an authoritative passage to cite. And you have removed one of the specific conditions that produced the hallucination. The system can now make the claim it was trying to make, correctly, attributed to you.

This is one of the most direct connections between the measurement side of this guide and the influence side. Your Factual Accuracy audit from Chapter 4 is not just a report on what is wrong. Every hallucinated URL in it is a specific, actionable instruction for what to build next.

In practice

Run your Factual Accuracy audit and note every hallucinated URL. Group them by the concept they were trying to support. Each group is a content brief and an EntityMap chunk waiting to be written. Work through the groups in order of how often the hallucination appears. The most frequent ones are the most urgent retrieval gaps.

🔬

Strategy

How your measurement data tells you what to put in your EntityMap

The measurement chapters are not separate from EntityMap. Each metric is a specific instruction about what belongs in the file.

Ch1

Competitive Map tells you which entities to prioritise

If retrieval systems place you in a competitive set that does not match your actual market, your EntityMap should declare the concepts that define your actual category and connect them explicitly to your brand. You are providing the evidence for a corrected picture.

Ch3

Topical Presence tells you which relations to declare

Every topic gap in your Topical Presence score is an implicit relationship the system has not made. If you have strong presence for “AI Share of Voice” but weak presence for “AI Topical Presence” despite covering the topic, the connection between them is probably buried in prose rather than declared. That is a relation to add: AI Share of Voice CONSISTS_OF AI Topical Presence.

Ch4

Factual Accuracy tells you where to add corrective chunks

Every factual error your audit found is a gap between what retrieval systems can find in your content and what you actually claim. For each error, find the relevant entity in your EntityMap and add a chunk that states the accurate claim clearly and directly. You are giving retrieval systems something better to find than the passage they were previously misreading.

🛠️

Implementation

How to build and publish your EntityMap

The full specification is at entitymap.org. What follows is the strategic logic behind the structure: what each part is for and why it matters. For the technical spec, JSON schema, and predicate reference, go to the source. The fastest way to generate the files automatically is the Waikay reference implementation, which is currently on a waitlist at waikay.io/entitymap.

What the file contains

Your entitymap.json has four parts. A publisher block declaring who you are: your name, URL, and Wikidata identifier if you have one. A list of entity objects, each representing a concept your site covers authoritatively. An evidence section on each entity with 1 to 5 of your best passages, each carrying your publisher name. And a relations section declaring how your entities connect to each other and to external concepts.

The publisher name on every evidence chunk must exactly match the publisher name in the root object. Not approximately. Exactly. Copy and paste it, never retype it. One inconsistency and downstream systems cannot reliably credit you. This single field is the attribution mechanism the whole system depends on.

What makes a good entity object

Each entity needs a stable ID (e_001, e_002 and so on, never reused), a name that reflects how your site uses the concept rather than how a dictionary would define it, and a description of 1 to 3 sentences specific to your context. “AI Share of Voice is a metric that measures the proportion of AI-generated answers in which a brand appears” is useful. “Share of voice is a marketing metric” is not.

For evidence chunks, choose your most specific passages, not introductory sentences. The passages that most precisely express what you claim about each concept. Fifteen well-evidenced entities are worth significantly more than eighty with thin evidence. Depth signals authority. Breadth without evidence does not.

Two ways to generate the files

✍️

Reference prompt

The EntityMap GitHub repository maintains a reference prompt. Paste sections of your content into your preferred LLM with the prompt and generate conforming entity objects. A site with 10 to 30 entities can be drafted in a few hours. No account needed.

📄

Two files, one source of truth

You publish both entitymap.json and entitymap.html. The HTML is a crawlable rendering of the JSON with embedded structured data. Never maintain them separately. The JSON is the source of truth. Always generate the HTML from it.

Getting retrieval systems to find it

Sitewide footer link to entitymap.html

The most reliable signal available today. Every crawler that follows HTML links will find it: GPTBot, PerplexityBot, ClaudeBot, GoogleOther. A sitewide footer means every page on your site carries a route to your knowledge layer regardless of which page the crawler enters from. Link to the HTML file, not the JSON. The HTML renders as readable content with embedded structured data.

Link tag in the HTML head of every page

Add <link rel="entitymap" type="application/json" href="https://yourdomain.com/entitymap.json" /> to the head of every page. This is the signal for systems that consume structured metadata from page heads rather than following visible links.

robots.txt hint and sitemap entry

Add EntityMap: https://yourdomain.com/entitymap.json to your robots.txt, and list entitymap.html in your sitemap with priority 0.9 and changefreq weekly. The weekly changefreq signals freshness, which increasingly factors into what retrieval systems choose to fetch.

🔗

Connected chapters

Where to go from here

EntityMap sits in Layer 3 alongside Citation Data because both chapters are about influence, not measurement. Chapter 6 tells you which external sources are shaping what retrieval systems say about you. This chapter tells you how to make your own content a stronger and more attributable source than those external signals. They are two sides of the same problem.

Chapter 9 (NLP and Entity Analysis) is the theoretical layer beneath this one. It explains how systems build conceptual associations from text. EntityMap is the practical mechanism for making those associations explicit rather than leaving them to inference. Chapter 9 explains the territory. EntityMap is the map you publish for it.

If you want to act immediately: run your Factual Accuracy audit, collect every hallucinated URL, group them by concept, and build your first EntityMap entity set from that list. It is the most direct path from measurement to influence in this guide. You are using the system’s own errors as a briefing document for exactly what to build.

📖