🗄️
Chapter 10 Layer 4: Methodology

Data Gathering Methods

APIs, scraping, and manual testing. How to collect AI responses at scale without introducing the bias that makes your measurement data unreliable.

TL;DR

The short version

Key points
  • Two primary methods: scraping and APIs. Each has genuine strengths the other lacks. Neither is simply better.
  • Scraping captures the real-user experience. It reaches platforms with no API access. Trade-offs: instability, legal ambiguity, and personalisation distortion.
  • APIs offer control and reproducibility. Consistent parameters across runs and over time. Trade-offs: cost and limited platform coverage.
  • A hybrid approach is right for most measurement programmes. APIs for the controlled baseline. Scraping for real-world coverage.
  • Manual testing is not optional. Automated pipelines tell you what the data says. Manual testing tells you what it means.
⚖️
The methods

Scraping vs APIs

Method 01 🔍

Scraping

Mimics a human user interacting with a public-facing AI interface. Reaches platforms that offer no API access. Core advantage is real-world fidelity: it reflects what buyers actually see.

  • Captures the real user-facing output including platform UI
  • Reaches non-API surfaces like Google AI Overviews
  • Significantly cheaper per query
  • Responses shaped by account history if logged in
  • Unstable: login walls, CAPTCHAs, interface changes
  • Legal status varies by jurisdiction and platform terms
Method 02 🔌

APIs

Direct, structured access to model outputs. Same prompt, same model version, same parameters, every time. The foundation of any longitudinal measurement programme.

  • Reproducible outputs across runs and over time
  • No personalisation distortion
  • Persona injection: simulate specific buyer types
  • Model version explicit and consistent
  • Significantly more expensive per query
  • Limited to platforms that offer API access
Dimension Scraping APIs
Cost per query Low, up to 10x cheaper High, metered per call
Output stability Variable, interface dependent High, consistent across runs
Real-user fidelity Reflects actual buyer-facing output Sterile, no interface context
Personalisation control Difficult, account history skews results Full control via persona injection
Platform coverage Broad, includes non-API surfaces Limited to platforms with API access
Legal risk Jurisdiction-dependent grey area Low, within platform terms
🎯
When to use which

Matching method to task

API

Longitudinal tracking

Measuring Share of Voice or Topical Presence over time requires consistent parameters. APIs ensure the same model version and prompt structure across every run, making month-on-month comparisons meaningful. A single interface change in a scraping setup can create a discontinuity that looks like a real trend.

API

Persona testing

Understanding how AI responds to different buyer types requires injecting persona context cleanly into the system prompt. An enterprise IT buyer and a solo founder asking the same question can receive meaningfully different responses. System-level persona injection cannot be replicated reliably through scraping.

Scrape

Platform coverage audits

Google AI Overviews has no public API access — scraping is the only way to capture what real users see there. Bing Copilot has some API access via Azure OpenAI and the Bing Search API, but the full consumer-facing Copilot experience with its grounded interface is not fully replicable via API alone. For any programme that needs to measure what buyers actually see on these surfaces, scraping remains essential.

Scrape

High-volume coverage runs

When breadth matters more than precision — auditing across a large prompt set to identify gross gaps — scraping’s cost efficiency makes it the practical choice. Running five hundred prompts through an API at scale quickly becomes expensive. Scraping makes that volume feasible.

Both

Full measurement programmes

Any programme tracking Share of Voice, Topical Presence, citations, and entity clustering across multiple models needs both. APIs for the controlled baseline that powers longitudinal analysis. Scraping for real-world coverage of the surfaces your buyers actually use.

⚠️
Watch out for

Sources of bias that corrupt your data

Measurement bias does not always look like bad data. Sometimes it looks like a trend that is actually an artefact of how you collected the data. These are the most common sources.

01

Personalisation contamination

If you run scraping prompts from an account with a long browsing or chat history, the responses will be shaped by that history. A logged-in account that has been researching your brand will produce different results than a fresh session. Always scrape from clean, fresh accounts with no prior interaction history relevant to your queries.

02

Model version drift

Scraping hits whatever model version the platform is currently serving. If the platform updates its underlying model mid-measurement cycle, your data will contain responses from two different models. What looks like a trend in your Share of Voice data may actually be a model version change. APIs let you pin to a specific model version to avoid this.

03

Prompt wording effects

Small changes in prompt wording produce measurably different outputs. If you are comparing data collected with slightly different prompt phrasings, you are not comparing the same thing. Keep your core prompt set locked. If you need to update a prompt, treat the new version as a new series and do not blend the data with the old version.

04

Channel mixing

As covered in Chapter 7, running prompts with browsing on and browsing off produces data from two different channels. If you mix these in the same dataset without labelling them, your frequency counts will be meaningless. Always log the channel alongside every run. See Chapter 7 for the full explanation.

05

Insufficient run volume

AI outputs are probabilistic. A brand that appears in one run out of two has a presence rate of 50%, but that number is statistically meaningless. You need at least five to ten runs per prompt to get a stable frequency estimate. The more competitive your category, the more runs you need to surface meaningful patterns.

🧑‍💻
Manual testing

Why manual testing still matters

Automated pipelines tell you what the data says. Manual testing tells you what it means.

Automated pipelines optimise for scale and reproducibility. What they do not provide is interpretive context. A response that scores well on entity extraction may still represent your brand in a subtly wrong way that no automated classifier will catch.

Manual testing runs alongside automated pipelines, not instead of them. The goal is not to replace systematic measurement but to add the human interpretive layer that makes data strategically useful rather than just numerically accurate.

🔎

Spot-check anomalies

When your automated data produces a result that surprises you, read the actual responses. The raw text often reveals something the extraction layer missed: a framing, a qualifier, a competitor comparison that changes the interpretation entirely.

🧭

Explore emerging signals

New topics, unexpected attributes, or unfamiliar brand names appearing in your data are worth investigating manually before building automated tracking around them. Not every signal deserves a metric.

🗣️

Test from a buyer’s perspective

Run prompts the way a real buyer would, not the way your tracking system does. The difference in phrasing, context, and follow-up questions often reveals positioning problems that structured prompts miss.

📅

Run a monthly sense check

Spend thirty minutes a month reading raw responses rather than looking at aggregated scores. The score tells you the direction. The raw text tells you the story.

How Waikay does it

Waikay’s data gathering architecture

Waikay uses a hybrid architecture that combines API access and scraping to balance precision with real-world coverage. API calls power the longitudinal tracking that underlies Share of Voice, Topical Presence, and Factual Accuracy data. Scraping covers the grounded search surfaces — Google AI Overviews, Bing Copilot — that have no API access.

Platform coverage

Waikay tracks AI responses across ChatGPT, Gemini, Perplexity, and Claude using a combination of API and scraping methods that reflects how your buyers actually use these tools, not just how they are technically accessible.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.