Guides, explanations, and troubleshooting for your AI visibility audits.
Type or paste any URL you want to check. You can audit homepages, specific pages, product pages, blog posts—any publicly accessible URL.
Typically 15–30 seconds. Performance data from CrUX, crawlability checks (robots.txt, HTTP Status Check, noindex), and content analysis checks (visibility gap, Content Diff impact, render pattern, structure) all run in parallel.
Results appear across tabs: Overview (summary), Performance (TTFB/CLS/INP), Crawlability (33 AI crawlers + HTTP Status Check + noindex), and Content Visibility (JS vs HTML, Content Diff, Render Pattern, Semantic Heading, ARIA Labels).
Click "Export PDF" to download a complete report you can share with developers, clients, or your team.
Google's CrUX dataset only includes URLs with sufficient real-user traffic. Low-traffic pages may not meet the aggregation threshold required to generate metrics. When URL-level data is unavailable, we check origin-level data (your whole domain) as a fallback. We clearly label when either is missing rather than guessing.
For performance insights on low-traffic pages, consider running Lighthouse for synthetic test data alongside CrUX monitoring.
URL-level: Metrics specific to the exact page you entered. Reflects performance for that specific URL based on real user visits.
Origin-level: Metrics aggregated across your entire domain. Useful for understanding overall site performance when page-specific data isn't available.
Both matter. A fast homepage with a slow product page means AI systems may reliably access some pages while timing out on others.
TTFB measures server response time—the raw time before any content renders. Your site may feel fast due to geographic proximity (you're near the server), browser caching, or client-side rendering that makes subsequent navigation feel instant.
CrUX shows real-world data from diverse users across different networks and locations. A 75th percentile TTFB of 2000ms means 25% of your visitors experience even slower responses—and AI crawlers querying from data center IPs may not benefit from your CDN caching.
| Metric | Good | Needs Improvement | Poor |
|---|---|---|---|
| TTFB | ≤ 800ms | 800–1800ms | > 1800ms |
| CLS | ≤ 0.1 | 0.1–0.25 | > 0.25 |
| INP | ≤ 200ms | 200–500ms | > 500ms |
Check for these common causes:
User-agent: * followed by Disallow: / at the top of your robots.txt blocks all crawlersWe parse your robots.txt according to RFC 9309. If results look unexpected, review your file at yourdomain.com/robots.txt directly.
No. Selective blocking is completely valid. What matters is that your decisions are intentional, not accidental. You may want to allow search crawlers (PerplexityBot, OAI-SearchBot) for real-time visibility while blocking training crawlers (GPTBot, CCBot) if you're concerned about training data use. Both are legitimate choices.
robots.txt is a request for crawlers to honor your rules—it's not a technical enforcement mechanism. Any compliant, well-behaved crawler will respect the rules, but robots.txt cannot technically prevent access the way authentication or IP blocking can.
Major AI providers (OpenAI, Anthropic, Google, etc.) document that they respect robots.txt. For real access control, use server-level authentication, not robots.txt.
If your goal is maximum AI visibility, prioritize allowing: GPTBot (ChatGPT training/search), OAI-SearchBot (ChatGPT Search), ClaudeBot (Claude), PerplexityBot (Perplexity), Google-Extended (Gemini), CCBot.
robots.txt is a crawl access request file (usually origin-wide). noindex is a page-level directive sent via meta robots or X-Robots-Tag to request exclusion from indexing.
A page can be crawlable and still excluded if noindex is present.
robots.txt may allow access, but runtime requests can still fail due to WAF rules, anti-bot filtering, or infrastructure controls. HTTP Status Check validates live response codes for Browser + GPTBot + ClaudeBot + PerplexityBot on the audited URL.
Interpretation: 2xx = reachable, 3xx = redirected (still good), 4xx = warning, 429/5xx = critical, timeout/no response = unavailable. If Browser returns 429 and the server sends Retry-After, that value is shown in the Browser row.
We compare the word count in the JavaScript-rendered view (what a user sees) against the raw HTML response (what many crawlers see). The gap percentage is the proportion of content that requires JavaScript to appear.
Example: JS view has 1000 words, raw HTML has 700 words = 30% gap. This means 30% of your content only appears after JavaScript executes.
Use the Content Diff panel alongside this gap number to see which specific sections are likely invisible to crawlers and worth fixing first.
Not necessarily. Context matters:
If the gap looks manageable but your impact summary is high, trust the Content Diff list and fix those highlighted sections first.
Both indicate a mismatch between raw HTML and the rendered page. "Added by JS" means content may be invisible to crawlers that do not execute JavaScript fully. "Removed by JS" means content present in raw HTML no longer appears in the final rendered view.
Either mismatch can reduce extraction reliability, so both are treated as risk signals in the Content Diff section.
The Render Pattern card helps you decide which fix is most likely appropriate, but it remains heuristic rather than a hard framework detector. Use Content Diff to prioritize exactly which missing sections to move into raw HTML first.
The Render Pattern card is a heuristic summary based on raw HTML coverage, framework markers, and selected response headers. It helps you quickly tell whether the page looks mostly CSR, SSR, Static, Hydrated, or Hybrid.
It can also show secondary hints for SSG, ISR, ESR, or islands/partial hydration when enough evidence is exposed, but those are directional hints rather than confirmed states. DPR cannot be reliably identified from a single external fetch.
The Semantic Heading card checks heading hierarchy from the raw HTML response—the order of H1–H6 tags, whether levels are skipped, and overall structural consistency. Crawlers and agent-style workflows use heading structure as a cue for content extraction, not just visual layout.
This is a structural extraction signal for AI systems, not a full WCAG accessibility audit. Native HTML elements and a logical heading hierarchy are the foundation; ARIA complements them for custom interactions.
The ARIA Labels card counts aria-label, aria-labelledby, aria-describedby attributes and explicit role values present in the raw HTML response. Pages built primarily with native HTML elements may have low counts—this is normal and expected.
The check is most useful for identifying custom interactive controls (tabs, modals, carousels) that lack explicit roles or labels, as these can reduce extraction reliability for systems that rely on structural cues. This is not a full WCAG accessibility audit.
Lab Experiments are exploratory diagnostics available inside the audit tool. They check signals that have been widely discussed in the AI visibility space—but none of them (so far) have been shown to have any direct impact on LLM visibility.
Use them to investigate potential structure and ingestion signals, not as a ranking guarantee. They are kept separate from the core checks (Performance, Crawlability, Content Visibility) to avoid mixing confidence levels.
Checks whether a /llms.txt file is present at the root of your domain. llms.txt is a proposed convention for giving LLMs structured guidance about a site's content. Its presence or absence is an ecosystem signal—not proof of ingestion or indexing behavior by any specific AI system.
Detects @type values from JSON-LD blocks found in the raw HTML. The chips shown represent all detected types from sampled items—both valid and invalid. Structured data can improve machine readability, but its presence is not direct proof of inclusion in AI answers.
Converts the page content to Markdown and compares token, word, and character counts between the HTML and Markdown representations. A negative difference means Markdown is more compact for this content. Token counts depend on representation and tokenizer assumptions—treat these figures as directional, not absolute.
Common causes: Server is down or extremely slow (>30 seconds), invalid URL format, firewall blocking the audit request.
Try: Verify the URL loads in your browser, check if the site is reachable, try a different page from the same domain.
The extension cannot analyze browser internal pages (chrome://), extension settings pages, local files (file://), or pages with strict Content Security Policies.
Solution: Use the web app at beseenby.ai for these cases.
We fetch robots.txt fresh on each audit, but there may be CDN caching at your end.
Try: Wait 5–10 minutes, verify your changes are visible at yourdomain.com/robots.txt directly, then run the audit again.