AI bots reference

All 33 AI crawlers BeSeenByAI checks, grouped by category, with user agent strings and what blocking each type means.

BeSeenByAI checks 33 AI crawlers across five categories. The category determines what blocking a bot actually means for your AI visibility.

Why categories matter

Not every AI bot does the same job. A blanket User-agent: * block in robots.txt — or a WAF rule targeting “AI scrapers” — typically takes down every category at once, including the search agents that would otherwise cite you in real-time answers.

Understanding which category a bot belongs to lets you make deliberate decisions: block training crawlers if you don’t want your content used for model training, while keeping search and browsing agents open.

Search agents

These bots fetch pages in real time to answer user queries. If you want to be cited when someone asks ChatGPT, Perplexity, or another AI assistant a question, these are the crawlers that need to reach you. Blocking search agents is almost never intentional — it is the most common unintended consequence of a wildcard robots.txt block or a WAF rule targeting AI user agents.

User Agent Company Product
OAI-SearchBot OpenAI SearchGPT crawler
PerplexityBot Perplexity Perplexity AI search crawler
Claude-SearchBot Anthropic Claude search functionality
DuckAssistBot DuckDuckGo DuckDuckGo AI search

Browsing agents

These bots fetch pages on behalf of a user in real time — they are the equivalent of a browser being driven by an AI assistant. When a user asks an AI to “look up” or “check” a page, one of these agents is typically sent.

User Agent Company Product
ChatGPT-User OpenAI ChatGPT user-initiated browsing
Perplexity-User Perplexity Perplexity user browsing
Claude-User Anthropic Claude user-initiated browsing
GoogleAgent-Mariner Google Google AI agent browsing
MistralAI-User Mistral Mistral AI user browsing
facebookexternalhit Meta Meta external content fetcher
Meta-ExternalAgent Meta Meta AI external agent
meta-externalfetcher Meta Meta content fetcher

Training crawlers

These bots collect web content to feed model training datasets. Blocking them is a legitimate choice if you do not want your content used for AI training — but it should be a deliberate choice, not an accidental side effect of a wildcard rule. Blocking training crawlers has no effect on whether you are cited in search results or real-time answers.

User Agent Company Product
GPTBot OpenAI ChatGPT training crawler
ClaudeBot Anthropic Claude training crawler
Google-Extended Google Gemini AI training
CloudVertexBot Google Google Cloud Vertex AI
Amazonbot Amazon Alexa AI training
Applebot-Extended Apple Apple Intelligence training
FacebookBot Meta Facebook AI crawler
CCBot Common Crawl Open web crawl data (used by AI models)

Research tools

These bots power AI-driven research features that are triggered by user actions rather than running continuously. Blocking them means users of these tools cannot pull your content into their research sessions.

User Agent Company Product
Gemini-Deep-Research Google Gemini deep research feature
Google-NotebookLM Google NotebookLM AI assistant

Google crawlers

These are Google’s standard indexing crawlers. While not AI-specific, they are included because they feed Google’s search index and AI features (AI Overviews, SGE). Blocking Googlebot while allowing Google-Extended is a common misconfiguration — standard indexing has to work for AI features to include your content.

User Agent Company Product
Googlebot Google Googlebot Desktop
Googlebot-Mobile Google Googlebot Smartphone
Googlebot-Image Google Google Image crawler
Googlebot-Video Google Google Video crawler
Googlebot-News Google Google News crawler
Storebot-Google Google Google StoreBot Desktop
Storebot-Google-Mobile Google Google StoreBot Mobile
GoogleOther Google GoogleOther Desktop
GoogleOther-Mobile Google GoogleOther Mobile
GoogleOther-Image Google GoogleOther Image crawler
GoogleOther-Video Google GoogleOther Video crawler

How to allow or block bots in robots.txt

Allow all AI bots (default if no rules exist): No entry needed — bots not mentioned in robots.txt are allowed by default.

Block only training crawlers, keep search agents:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Block all AI bots (intentional opt-out):

User-agent: GPTBot
User-agent: OAI-SearchBot
User-agent: PerplexityBot
User-agent: ClaudeBot
User-agent: Claude-SearchBot
Disallow: /

A wildcard block affects every bot including search agents:

User-agent: *
Disallow: /

This blocks everything. Avoid unless you intend to block all crawlers completely.

For background on how the crawlability check works and what the mismatch finding means, see the Crawlability tab guide.