AI bots reference

All 33 AI crawlers BeSeenByAI checks, grouped by category, with user agent strings and what blocking each type means.

BeSeenByAI checks 33 AI crawlers across five categories. The category determines what blocking a bot actually means for your AI visibility.

Why categories matter

Not every AI bot does the same job. A blanket User-agent: * block in robots.txt — or a WAF rule targeting “AI scrapers” — typically takes down every category at once, including the search agents that would otherwise cite you in real-time answers.

Understanding which category a bot belongs to lets you make deliberate decisions: block training crawlers if you don’t want your content used for model training, while keeping search and browsing agents open.

Search agents

These bots fetch pages in real time to answer user queries. If you want to be cited when someone asks ChatGPT, Perplexity, or another AI assistant a question, these are the crawlers that need to reach you. Blocking search agents is almost never intentional — it is the most common unintended consequence of a wildcard robots.txt block or a WAF rule targeting AI user agents.

User Agent	Company	Product
`OAI-SearchBot`	OpenAI	SearchGPT crawler
`PerplexityBot`	Perplexity	Perplexity AI search crawler
`Claude-SearchBot`	Anthropic	Claude search functionality
`DuckAssistBot`	DuckDuckGo	DuckDuckGo AI search

Browsing agents

These bots fetch pages on behalf of a user in real time — they are the equivalent of a browser being driven by an AI assistant. When a user asks an AI to “look up” or “check” a page, one of these agents is typically sent.

User Agent	Company	Product
`ChatGPT-User`	OpenAI	ChatGPT user-initiated browsing
`Perplexity-User`	Perplexity	Perplexity user browsing
`Claude-User`	Anthropic	Claude user-initiated browsing
`GoogleAgent-Mariner`	Google	Google AI agent browsing
`MistralAI-User`	Mistral	Mistral AI user browsing
`facebookexternalhit`	Meta	Meta external content fetcher
`Meta-ExternalAgent`	Meta	Meta AI external agent
`meta-externalfetcher`	Meta	Meta content fetcher

Training crawlers

These bots collect web content to feed model training datasets. Blocking them is a legitimate choice if you do not want your content used for AI training — but it should be a deliberate choice, not an accidental side effect of a wildcard rule. Blocking training crawlers has no effect on whether you are cited in search results or real-time answers.

User Agent	Company	Product
`GPTBot`	OpenAI	ChatGPT training crawler
`ClaudeBot`	Anthropic	Claude training crawler
`Google-Extended`	Google	Gemini AI training
`CloudVertexBot`	Google	Google Cloud Vertex AI
`Amazonbot`	Amazon	Alexa AI training
`Applebot-Extended`	Apple	Apple Intelligence training
`FacebookBot`	Meta	Facebook AI crawler
`CCBot`	Common Crawl	Open web crawl data (used by AI models)

Research tools

These bots power AI-driven research features that are triggered by user actions rather than running continuously. Blocking them means users of these tools cannot pull your content into their research sessions.

User Agent	Company	Product
`Gemini-Deep-Research`	Google	Gemini deep research feature
`Google-NotebookLM`	Google	NotebookLM AI assistant

Google crawlers

These are Google’s standard indexing crawlers. While not AI-specific, they are included because they feed Google’s search index and AI features (AI Overviews, SGE). Blocking Googlebot while allowing Google-Extended is a common misconfiguration — standard indexing has to work for AI features to include your content.

User Agent	Company	Product
`Googlebot`	Google	Googlebot Desktop
`Googlebot-Mobile`	Google	Googlebot Smartphone
`Googlebot-Image`	Google	Google Image crawler
`Googlebot-Video`	Google	Google Video crawler
`Googlebot-News`	Google	Google News crawler
`Storebot-Google`	Google	Google StoreBot Desktop
`Storebot-Google-Mobile`	Google	Google StoreBot Mobile
`GoogleOther`	Google	GoogleOther Desktop
`GoogleOther-Mobile`	Google	GoogleOther Mobile
`GoogleOther-Image`	Google	GoogleOther Image crawler
`GoogleOther-Video`	Google	GoogleOther Video crawler

How to allow or block bots in robots.txt

Allow all AI bots (default if no rules exist): No entry needed — bots not mentioned in robots.txt are allowed by default.

Block only training crawlers, keep search agents:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Block all AI bots (intentional opt-out):

User-agent: GPTBot
User-agent: OAI-SearchBot
User-agent: PerplexityBot
User-agent: ClaudeBot
User-agent: Claude-SearchBot
Disallow: /

A wildcard block affects every bot including search agents:

User-agent: *
Disallow: /

This blocks everything. Avoid unless you intend to block all crawlers completely.

For background on how the crawlability check works and what the mismatch finding means, see the Crawlability tab guide.