The JavaScript Content Gap

Modern websites often load content dynamically using JavaScript frameworks like React, Vue, or Angular. This creates a better user experience—but it also creates a visibility gap for crawlers that don't execute JavaScript.

✓ What users see

Complete page with all text, images, and interactive elements loaded after JavaScript executes.

✗ What many bots see

Minimal HTML skeleton with loading placeholders and empty containers—no actual content.

When important content is missing from the initial HTML response, crawlers that don't execute JavaScript—or execute it differently—can't capture your content.

Rendering Gap

What you see vs. what AI crawlers see

Pricing pages built with React or Next.js look complete to visitors — but AI crawlers don't run JavaScript. Drag to see what disappears.

Browser View

JS-rendered · hover elements to see what's dynamic

GPTBot sees

JavaScript not executed

palto.io/pricing

Palto

Product Docs Blog

Simple, transparent pricing

Free

Starter

Content Visibility Analysis

What We Check

JavaScript-Rendered View

What appears in the DOM after all scripts execute. This is what a real user sees in their browser. Captured using a headless browser with full JS rendering.

Raw HTML Response

The initial server response before any client-side rendering. This is what most crawlers receive—including GPTBot, ClaudeBot, and PerplexityBot.

We calculate word count in each view, percentage gap, and bot-specific summaries for GPTBot, ClaudeBot, and PerplexityBot. We also provide a Content Diff impact summary and a prioritized list ("See Content Invisible to AI Bots") so you can quickly review high-risk sections.

Secondary hints may suggest SSG, ISR, ESR, or islands/partial hydration when the page exposes enough framework or cache signals, but those remain heuristic. In Content Diff, both "Added by JS" and "Removed by JS" are treated as risk indicators because either can create crawler-visible mismatches. Scope: URL-level check.

Content Structure (Raw HTML)

Semantic Heading

Checks heading hierarchy from the raw HTML response—sequence, skipped levels, and overall structure. Crawlers and agent-style workflows rely on heading order as a structural cue for content extraction, not just visual layout.

This is a structural extraction signal, not a full accessibility audit.

ARIA Labels

Counts aria-label, aria-labelledby, aria-describedby attributes and explicit role values present in raw HTML. Low counts are expected on pages built primarily with native HTML elements—this check surfaces gaps in custom interactive controls.

This is a structural extraction signal, not a full WCAG accessibility audit.

Missing HTML Content Reduces Retrievability

Training Data Access

AI models trained on web data typically use snapshots that don't execute JavaScript. Content requiring JS won't appear in training datasets.

Real-Time Retrieval

AI systems fetching content in real-time have time and resource budgets. Many skip JavaScript execution entirely to stay within those constraints.

Extraction Reliability

Content extraction tools and less sophisticated crawlers may not render JavaScript at all, making JS-dependent content permanently invisible to them.

Interpretation Guide

Good — <5% gap

Most content is present in raw HTML. Minimal visibility risk. No immediate action needed.

Warning — 5–30% gap

Moderate content visibility gap. Some crawlers may capture incomplete versions. Review which sections are missing from raw HTML.

Risk — >30% gap

Significant content visibility gap. Many crawlers likely missing key sections. Priority fix: implement SSR, pre-rendering, or move critical content to HTML.

Common Issues We Catch

Product Descriptions

E-commerce sites loading product details via AJAX after the initial page load. Crawlers see a shell page with no product content.

Blog Content

Articles rendered entirely client-side with React or Vue. The raw HTML contains only a mounting point—no article text whatsoever.

Navigation and Internal Links

Menus built via JavaScript mean crawlers can't discover the linked pages, breaking internal linking for AI discovery.

Metadata and Structured Data

Schema.org markup injected client-side after page load. Crawlers that don't execute JS miss the structured data entirely.

How to Improve Based on Render Pattern

Use the Render Pattern card as a shortcut. Likely CSR or Hybrid results usually mean too much content depends on client-side rendering, while Static, Hydrated, and SSR-like results usually indicate the core HTML is already reaching crawlers. SSG, ISR, and ESR labels should be treated as directional hints, not guarantees.

SSR — Server-Side Rendering

Effort: High | Impact: High

Render pages on the server so HTML contains all content before sending to the client. Frameworks: Next.js, Nuxt, SvelteKit.

SSG — Static Site Generation

Effort: Medium | Impact: High

Pre-render pages to HTML at build time. Best for blogs, documentation, and marketing pages where content doesn't change per-request.

Pre-rendering

Effort: Low–Medium | Impact: Medium

Use a pre-rendering service like Prerender.io or Rendertron to serve static snapshots to bots while users get the full JS experience.

Progressive Enhancement

Effort: Medium | Impact: Medium

Ensure critical content exists in the base HTML and use JavaScript only to enhance the experience—not to deliver the content in the first place.

Frequently Asked Questions

It is a heuristic summary based on raw HTML coverage, framework markers, and caching headers. It helps you see whether the page looks predominantly CSR, SSR, Static, Hydrated, or Hybrid.

It can also surface hint-level signals for SSG, ISR, ESR, or islands/partial hydration when the response exposes enough evidence. Those are not hard confirmations, and DPR cannot be reliably identified from a single external fetch.

No. JavaScript is essential to modern web experiences. The goal is to ensure that your critical content is also accessible in raw HTML, not to eliminate JavaScript entirely. Users can still get the full interactive experience.

Not if done correctly. SSR can actually improve perceived performance by delivering visible content faster. With proper caching (CDN-level or edge caching), SSR pages can be as fast as static files.

Behavior varies significantly. Some execute JavaScript fully, some execute it partially, and some not at all. Rather than targeting each crawler's specific behavior, the safest approach is to include critical content in the initial HTML response.

Check after major frontend updates, site architecture changes, content migrations, and new page template deployments. For active sites with frequent releases, a monthly check is reasonable. Any change to your rendering approach warrants an immediate recheck.

Yes. Google handles JavaScript well but operates under a rendering budget—pages that are complex to render may be processed less frequently. Content present in raw HTML is always indexed on the first crawl. The same fix that helps AI visibility also reduces SEO rendering dependency.

What Users See Isn't Always What Bots Can Read