Methodology

How Intendity measures AI search visibility.

The full methodology behind every metric in the dashboard. Prompt execution, mention detection, visibility scoring, share of voice, citation coverage. And the limits of measurement, stated explicitly.

Measurement philosophy

One answer is an anecdote. Many answers are signal.

Generative models are non-deterministic by design. The same prompt asked twice in the same minute can return different brands, different framing, different citations. Any measurement framework that treats a single answer as ground truth will mislead.

Intendity treats AI visibility as a distribution over many executions. Every metric in the dashboard is a roll-up across the (prompt × model × region) matrix on a given day. A daily visibility score of 64 means: of all the prompts run across all the tracked models in the past 24 hours, the brand was named in 64% of them. Variance is absorbed into the average; trend over weeks reveals the real signal.

This is why a meaningful AEO program requires daily automation. Manual checks under-sample; small prompt sets under-cover the buyer journey; single-model checks miss the way answers shift across providers.

What we capture per run

Six structured signals per (prompt × model) execution.

Each run produces one row in the runs table (raw model response, status, model version, region) and one row in the mentions table (the parsed analysis below). Both persist forever on Pro plans.

Mention status

Whether the tracked brand was named in the answer. Boolean. Drives mention-rate calculations.
Position

Where in the answer the brand appears. The first brand named anchors the consideration set; later mentions are weighted differently for downstream metrics.
Sentiment + score

Positive, neutral or negative classification with a 0–100 score. Captures whether a high mention rate is good news or a brand-safety problem.
Cited sources

Every URL the model cited inline. Wikipedia, Reddit threads, trade press, listicles, your own pages. Drives citation-coverage analysis.
Competitor mentions

Every other named brand in the same answer, with their position and sentiment. Drives share of voice.
Context excerpt

The exact 1–2 sentences surrounding the brand mention, verbatim. Used for hallucination detection and qualitative review.

Run metadata (model version, region, timestamp, browsing-mode flag) is captured separately so historical comparisons remain like-for-like across model updates.

Mention detection

LLM-based parser with confidence scoring.

A naive string-match approach to mention detection breaks on three classes of input: ambiguous brand names that overlap with English (a brand called "Apex" matching unrelated text), aliases ("Acme Corp" vs "Acme"), and indirect references ("the leading enterprise CRM in Europe" pointing to a specific brand without naming it).

Intendity's parser is LLM-based. For every run, the raw model answer plus the brand's registered name, aliases, domain and category context are passed to the parser, which produces structured output: was the brand mentioned, where, alongside which competitors, in what sentiment, citing which sources, with what confidence.

Confidence scores are 0–100. A score above 80 indicates an unambiguous, named mention. Scores between 50 and 80 typically reflect alias or indirect-reference cases. Scores below 50 are flagged for review and excluded from default metric calculations. Power users can adjust the cutoff or surface the low-confidence pile.

Agreement with hand-coded baselines runs above 90% across the prompt sets we've measured against. Edge cases — particularly indirect references and short ambiguous brand names — are an active improvement area.

Scoring formulas

Three metrics, all of them defined.

Every metric on the dashboard is published. No black boxes.

Visibility score

0–100 daily aggregate. Mention rate across all (prompt × model × region) executions in the 24-hour window.

visibility_score(day) = 100 × mentioned_runs(day) / total_runs(day)

Share of voice

Brand mentions divided by total mentions across the named-competitor set, in the same prompt set and time window. Reveals whether visibility gains come from category growth or from displacing specific competitors.

share_of_voice = brand_mentions / (brand_mentions + sum(competitor_mentions))

Citation coverage

Of the URLs the model cites for the category's prompt set, the percentage where the brand has a meaningful positioned presence (named in the article, profiled, listed in a comparison table). Leading indicator: high citation coverage today predicts higher mention rate next quarter.

citation_coverage = positioned_source_urls / total_cited_source_urls

Run cadence and freshness

Daily automation, on-demand re-runs, version capture.

Pro accounts run every enabled prompt against every enabled model every day, by default. Manual on-demand runs are unlimited. Free accounts run manually with a daily cap; results are still persisted but history is truncated to a 3-day window.

Each run captures the model version string returned by the provider. When OpenAI ships a new GPT, Anthropic ships a new Claude, or Google rotates the Gemini Pro pointer, the change is visible in the runs table and trend lines remain interpretable across the transition.

Region defaults to the brand's primary market. Multi-region tracking is supported on Pro and recommended for any brand operating in more than one country — AI answers vary by language and locale, often dramatically.

A real sample

What 360 real mentions look like.

Numbers from a recent week of runs across our earliest test accounts (2 accounts, 6 brands, mixed categories — Romanian local services and Korean skincare). Small sample, deliberately biased category mix. We're showing it because the alternative — opaque "trust us" methodology pages — is worse. These aren't industry benchmarks. They're evidence the system produces structured data of the shape described above.

Mentions in sample

360

Over a single recent week.
Mention rate

33%

Brand named in ≈1 of every 3 runs.
Position 1 share

35%

Of runs that named the brand, 35% had it in the top slot.
Source citations captured

297

Across the 36 runs where the model cited URLs inline.
Distinct cited domains

100

Long-tail in this sample; concentrated in production prompt sets.

A bigger, deliberately-designed research project across a single defined category (~30 prompts × 4 models × 14 days) is in flight; results will publish under /blog when complete.

Limits and known biases

What this methodology does not do.

Stated explicitly because the alternative is letting buyers discover them later:

Sample size. A prompt set of 10 prompts × 4 models × 1 region produces 40 runs/day. That's enough to track trend, not enough to detect small differences. Prompt sets of 30+ are recommended for strategic decisions.
Indirect references. Answers that describe a brand without naming it ("the leading enterprise platform in this space") are partially captured by the LLM parser at lower confidence. Pure indirect mentions remain an under-counted edge case.
Provider-side variance. Provider API responses occasionally differ from consumer-facing app responses (different defaults, different ranking signals). Intendity uses provider APIs; the absolute numbers may differ from a buyer hand-checking in the app, though trends correlate.
No causal attribution. A recommendation that ships in week 1 and a mention rate that rises in week 4 is correlation, not proof. Multiple variables move at once. We surface the data; attribution is the program owner's judgement call.
Hallucination correction is source-level. Intendity does not ask the model to forget bad information. We surface the underlying source the model is leaning on (a stale Wikipedia paragraph, an outdated review thread) and recommend the source-level fix. Source updates propagate to model answers within 1–6 weeks of recrawl.

Apply the methodology.

Run your first brand and see visibility, share of voice and citation coverage on real prompts in five minutes.

Start free Glossary