Home Research The State of AI Visibility 2026
BeSeenBy Research

In 2026, 70% of websites have content AI can't fully see

AI visibility is now a multi-billion-dollar market, and businesses are spending real money to show up in AI answers. So we scanned 706 real websites to see what AI actually sees when it visits. Most have technical problems that quietly stop AI from reading them.

706 sites scanned June 2026

56% of marketing leaders reported high investment in Generative Engine Optimization in 2025, and 94% planned to spend more in 2026. With that kind of money moving, you'd expect the basics of AI visibility to be handled. Our data says otherwise.

AI visibility looks like a black box, so people skip the fundamentals and jump straight to prompt tracking - watching whether AI mentions their brand, without checking whether AI can even read their site. But the fundamentals aren't a black box. They're measurable, and they're where the real problems hide.

Most of the sites we examined are client sites of SEO professionals. That's the uncomfortable part: these sites are built for Google, and many rank well. AI still can't read them.

AT A GLANCE

Five things AI sees that you don't

01 · CONTENT VISIBILITY

How much disappears when JavaScript is the messenger?

70% of sites show a visitor more than they show AI. For 1 in 10, more than half the page is gone.

AI crawlers read the raw HTML your server sends - they don't run JavaScript. So if your content loads through JavaScript, a person sees the finished page while an AI bot sees the unfinished one. The answer to someone's question might be sitting right there on your page. AI just never gets to it.

70%
Sites with a JS gap
Content that's visible to people but invisible to AI bots
1 in 3
Lose 10%+ of content
A tenth or more of the page missing for AI
1 in 10
Lose 50%+ of content
More than half the page vanishes for an AI crawler

How much of the page is invisible to AI

Share of sites by size of the JavaScript content gap · 1,123 sites

Vibe-coded sites are the clearest case

Sites built on AI coding platforms run almost entirely on client-side JavaScript, so to an AI bot the page can be close to empty. People can browse them normally, but they are invisible to AI.

Sites losing more than half their content to AI

% of sites with a 50%+ content gap · Lovable 114 sites · Base44 106 · general web 1,123

What this means

  • If your content loads via JavaScript, assume an AI bot sees a thinner page than your visitors do - sometimes a near-empty one.
  • The fix is server-side or static rendering for the content that matters: get the words into the raw HTML.
  • Although the average gap is 16%, some sites that are heavily dependent on JS hide almost all content from AI.

02 · CRAWL ACCESS

Can AI crawlers even reach your content?

11.6% of sites block at least one AI crawler. 96% of those blocks never appear in robots.txt.

Before a bot can read your content, it has to get in. Most people assume blocking lives in robots.txt, where you can see it and change it. It usually doesn't. It happens at the server, in a CDN or a firewall rule nobody remembers setting. The bot gets a 403 Forbidden, a timeout, or a 429 rate limit. You open the site in a browser and everything works fine. The AI bot hits a closed door and leaves.

11.6%
Block ≥1 AI bot
185 of 1,597 reports
1 in 10
Block ClaudeBot
The single most-blocked AI crawler
96%+
Blocked at the server
Invisible in robots.txt - owners never see them

Most-blocked AI crawlers

% of all sites blocking each bot

Where the block happens

Server / HTTP layer vs robots.txt · 365 blocking sites

What a blocked bot actually receives

HTTP response on server-level blocks · 464 live bot checks

How blocked bots are turned away

  • 403 (53%): explicit. The site knows it's blocking AI bots. Easy to find, easy to undo.
  • Timeout (23%): the silent drop. No code, no error - the firewall quietly discards the request. Owners have no idea.
  • 429 (19%): mislabeled. Returned on the very first request with no Retry-After header - a rejection dressed up as rate limiting.
  • PerplexityBot gets far fewer 429s than ClaudeBot or GPTBot - WAF rules are usually written for the OpenAI and Anthropic bots, so Perplexity slips past and hits 403s or timeouts instead.

03 · SPEED

How long does your server make AI wait?

29.5% of sites take more than a second to respond.

Time to first byte is how long your server takes to start answering a request. The slower you are, the more likely an AI crawler deprioritizes the page or skips it - it's reading at scale and it isn't going to sit around waiting for you. A person will wait a second or two. A crawler going through thousands of pages has no reason to.

870ms
Average TTFB
C grade - AI may deprioritize the page
29.5%
Respond in over 1 second
D or F grade - effectively skippable
9.1%
Hit A+ (<200ms)
Fast enough for guaranteed AI inclusion

Server speed grade distribution

Field TTFB (p75), graded A+ to F · 264 sites with sufficient traffic data

What this means

  • Under a second is fine. Over a second and you're in the slow tier where crawlers start dropping pages.
  • The fastest sites answer in under 200ms - usually with edge caching or a CDN in front of the server.
  • Your site can feel fast to you and still be slow to a crawler: this is real-user field data, not a one-off lab test.

04 · STRUCTURE

Can AI tell who you are?

Nearly 1 in 5 sites have no structured data at all.

Structured data is the machine-readable layer that tells AI what a page is and who's behind it: Organization schema, author signals, links to your verified profiles. Without it, AI is guessing. It can read your words but it can't confirm you're a real, identifiable source - and that confirmation is part of what makes a page worth citing.

1 in 5
Zero JSON-LD
18.7% - no structured data at all
2 in 3
Missing sameAs links
68.9% - AI can't verify identity across the web
1 in 2
Missing Org schema
53% - AI doesn't know who runs the site

Most common failed authority checks

% of sites failing each check (broad-coverage checks, 100+ sites each)

What this means

  • The basics everyone gets right (title, H1, indexability). The failures are entirely in the AI-specific identity signals.
  • sameAs links - pointing to your LinkedIn, Wikipedia, or Google Business profile - are the cheapest credibility signal, and two-thirds of sites skip them.
  • Product and SaaS pages almost universally lack the schema AI needs to classify and surface them.

05 · CONTENT

When someone asks AI a question, can your page answer it?

37.9% of the pages we tested were too promotional for AI to cite. 9.3% made claims they didn't back up.

This is the test even a reachable, readable, well-structured page can fail. AI engines won't cite marketing language as a factual source. When a page reads like a sales pitch, or makes claims with nothing behind them, AI skips it and quotes someone who sounds like a reference instead.

37.9%
Too promotional
The #1 content warning - marketing copy AI can't cite
9.3%
Unsubstantiated claims
Stats and assertions with no verifiable evidence
45%
No "Strong" answer
Page partially supports the question - or can't answer it

What AI flags most often

Warning themes across 182 page analyses (a page can match several)

Can the page answer the question?

Verdict when a real search prompt is tested against the page · 170 runs

What this means

  • AI cites facts, prices, specs, and concrete examples - not adjectives. Promotional pages read as noise.
  • More words don't help: "diluted" pages average 2,292 words but bury the answer, making it harder for AI, not easier.
  • Back your claims with verifiable detail. The page that sounds like a reference gets quoted; the one that sounds like an ad gets skipped.

THE TAKEAWAY

It works for you. That's exactly why it stays broken.

Every problem in this report follows the same pattern. Your site works when you open it. You're doing well in Google and SEO, so you assume it works for AI too.

It doesn't - and worse, you have no way of knowing. Your content loads, but AI reads the page before the JavaScript runs. Your server feels fast to you, but it's slow enough that a crawler gives up. Your firewall blocks a bot, and you never hear about it. Your page looks great to a customer and reads like an ad to AI. None of this shows up in a browser, which is exactly why it sticks around.

And the tool most people reach for doesn't catch it. Prompt tracking tells you if AI mentions your brand. It doesn't tell you if AI can even read your site. So you can watch your mentions all day and miss the fact that the real problem is buried three layers down.

The only way to know is to look at your site the way the crawler does.

See what AI sees

Run the same scan on your own site. Paste in a URL and see what AI actually sees - the blocks, the rendering gaps, the slow responses, the missing signals. It's free, and it takes a minute.

Analyze your site free

The problems you find were probably there the whole time.

Methodology

  • Sample. 706 unique domains across roughly 1,600 page audits run on BeSeenBy.ai during its private beta (March–June 2026). Most are client sites of SEO professionals we contacted - sites built and optimized for Google search.
  • Content visibility. 1,123 audits comparing the raw HTML an AI bot receives against the fully JavaScript-rendered page. Vibe-coded analysis: 114 Lovable and 106 Base44 sites vs the 1,123-site baseline.
  • Crawl access. Live bot checks against ClaudeBot, GPTBot, and PerplexityBot plus robots.txt analysis; 11.6% figure from 185 of 1,597 reports. HTTP-response breakdown from 464 live server-level block checks.
  • Speed. Time to First Byte from Google's Chrome UX Report (p75 real-user field data). 264 domains had enough traffic to appear in CrUX, so this skews toward more established sites.
  • Structure. Per-page authority checks against page HTML and JSON-LD (Organization schema, sameAs links, structured-data presence, crawlable navigation, and more).
  • Content. Prompt Discovery: 182 analyses, one per URL (percentages are per-site). Prompt Fit: 170 graded runs across 132 URLs, where a real search prompt is tested against a page.
  • External figures. GEO investment statistics cited in the foreword are from eMarketer (2025).