Glossary

Definitions of terms used in BeSeenByAI reports and documentation.

A

AI Overviews — Google’s feature that generates a summarized answer at the top of search results using content from indexed pages. Requires pages to be crawlable by Googlebot and Google-Extended.

Authority — In BeSeenByAI reports, the authority score reflects how clearly a page signals what it is, who created it, and whether it can be trusted. Measured through structured data, JSON-LD, page type classification, and entity signals.

B

Bot — An automated program that fetches web pages. In the context of BeSeenByAI, “bot” refers to AI crawlers and search agents. See AI bots reference for the full list.

Bot Fight Mode — A Cloudflare feature (enabled by default on all plans) that classifies automated traffic and can challenge or block AI crawlers before they reach your server.

C

CLS (Cumulative Layout Shift) — A measure of how much visible page content moves unexpectedly during loading. One of Google’s Core Web Vitals. In BeSeenByAI, a high CLS indicates layout instability that can interfere with content extraction by crawlers.

Content visibility — In BeSeenByAI reports, the comparison between what a browser renders and what a bot can read. A gap between the two indicates content that exists in the browser but is inaccessible to AI crawlers.

Coverage snapshot — The percentage of rendered page content that is visible to bots. A high coverage percentage means most of what a browser sees, a bot can also read.

CrUX (Chrome UX Report) — Google’s dataset of real-world performance metrics aggregated from Chrome users. BeSeenByAI uses CrUX as the primary source for TTFB, CLS, and INP field data. Requires a minimum number of real Chrome visits to be available.

Crawlability — Whether AI bots can access a page. Involves two layers: the policy layer (robots.txt and noindex) and the runtime layer (what your infrastructure actually returns when a bot sends a request).

F

Field data — Performance metrics measured from real users visiting your page, sourced from CrUX. Reflects real-world conditions. Contrast with lab data.

I

INP (Interaction to Next Paint) — A measure of how quickly a page responds to user input. Replaced First Input Delay (FID) as a Core Web Vital in 2024. Most relevant for AI agents that interact with pages; less relevant for read-only crawlers.

Indexability — Whether a page is eligible to be indexed. A page can be crawlable but not indexable if it has a noindex directive in its meta tags or HTTP headers.

J

JSON-LD — A format for embedding structured data in a page’s HTML, typically inside <script type="application/ld+json"> tags. Tells AI systems what type of content the page contains, who created it, and what it is about. The preferred format for structured data.

L

Lab data — A performance measurement taken at audit time from a controlled environment. Always available, but reflects a single moment and network path. Used when CrUX field data is not available.

llms.txt — A proposed standard file (placed at /llms.txt) that gives AI systems a structured summary of a site’s content and links to key pages. Analogous to sitemap.xml but designed for LLM consumption.

M

Mismatch — A crawlability finding where robots.txt allows a bot but the live HTTP check shows the bot being rejected. Almost always indicates an accidental block at the infrastructure layer (CDN, WAF, hosting provider). See Crawlability tab guide.

Monitoring — BeSeenByAI’s feature that re-audits tracked URLs on a schedule and sends alerts when something meaningfully changes.

N

noindex — A directive in a page’s <meta> tag (<meta name="robots" content="noindex">) or X-Robots-Tag HTTP header that tells crawlers not to index the page. Some AI systems respect this directive.

O

Origin-level data — Performance metrics averaged across all pages on a domain, as opposed to page-specific data. Used by CrUX when page-level data is unavailable due to insufficient traffic.

P

Pagefind — The static search library used to power search in this Help Center. Indexes page content at build time; no server required.

Prompt Fit — A BeSeenByAI feature that tests whether a page’s content fits within different AI token budgets and would survive chunking by an LLM. Scores as Snippet, Summary, or In-depth.

R

Render pattern — How a page’s content is delivered: server-rendered (content in the initial HTML), client-rendered (content added by JavaScript after load), or mixed. Affects how much of the page bots without JavaScript execution can read.

Reverse Prompting — A BeSeenByAI feature that works backwards from a page’s content to identify what questions the page would answer well. Shows which AI queries are most likely to surface the page.

robots.txt — A plain-text file at the root of a domain (/robots.txt) that tells crawlers which pages they are allowed to access. Most AI bots respect it, but it is advisory — not a technical barrier.

S

Structured data — Machine-readable markup embedded in a page’s HTML that tells AI systems what type of content the page contains. Most commonly implemented as JSON-LD. Types include Article, Organization, Product, FAQ, and others.

T

TTFB (Time to First Byte) — The time between when a browser or bot sends a request and when the first byte of the server’s response arrives. The most critical performance metric for AI retrieval. See the performance grade reference for thresholds.

W

WAF (Web Application Firewall) — A security layer that filters incoming requests before they reach your application. WAF rules can block AI crawlers at the infrastructure level, bypassing robots.txt entirely.