BeSeenByAI checks 33 AI crawlers across five categories. The category determines what blocking a bot actually means for your AI visibility.
Why categories matter
Not every AI bot does the same job. A blanket User-agent: * block in robots.txt — or a WAF rule targeting “AI scrapers” — typically takes down every category at once, including the search agents that would otherwise cite you in real-time answers.
Understanding which category a bot belongs to lets you make deliberate decisions: block training crawlers if you don’t want your content used for model training, while keeping search and browsing agents open.
Search agents
These bots fetch pages in real time to answer user queries. If you want to be cited when someone asks ChatGPT, Perplexity, or another AI assistant a question, these are the crawlers that need to reach you. Blocking search agents is almost never intentional — it is the most common unintended consequence of a wildcard robots.txt block or a WAF rule targeting AI user agents.
| User Agent | Company | Product |
|---|---|---|
OAI-SearchBot |
OpenAI | SearchGPT crawler |
PerplexityBot |
Perplexity | Perplexity AI search crawler |
Claude-SearchBot |
Anthropic | Claude search functionality |
DuckAssistBot |
DuckDuckGo | DuckDuckGo AI search |
Browsing agents
These bots fetch pages on behalf of a user in real time — they are the equivalent of a browser being driven by an AI assistant. When a user asks an AI to “look up” or “check” a page, one of these agents is typically sent.
| User Agent | Company | Product |
|---|---|---|
ChatGPT-User |
OpenAI | ChatGPT user-initiated browsing |
Perplexity-User |
Perplexity | Perplexity user browsing |
Claude-User |
Anthropic | Claude user-initiated browsing |
GoogleAgent-Mariner |
Google AI agent browsing | |
MistralAI-User |
Mistral | Mistral AI user browsing |
facebookexternalhit |
Meta | Meta external content fetcher |
Meta-ExternalAgent |
Meta | Meta AI external agent |
meta-externalfetcher |
Meta | Meta content fetcher |
Training crawlers
These bots collect web content to feed model training datasets. Blocking them is a legitimate choice if you do not want your content used for AI training — but it should be a deliberate choice, not an accidental side effect of a wildcard rule. Blocking training crawlers has no effect on whether you are cited in search results or real-time answers.
| User Agent | Company | Product |
|---|---|---|
GPTBot |
OpenAI | ChatGPT training crawler |
ClaudeBot |
Anthropic | Claude training crawler |
Google-Extended |
Gemini AI training | |
CloudVertexBot |
Google Cloud Vertex AI | |
Amazonbot |
Amazon | Alexa AI training |
Applebot-Extended |
Apple | Apple Intelligence training |
FacebookBot |
Meta | Facebook AI crawler |
CCBot |
Common Crawl | Open web crawl data (used by AI models) |
Research tools
These bots power AI-driven research features that are triggered by user actions rather than running continuously. Blocking them means users of these tools cannot pull your content into their research sessions.
| User Agent | Company | Product |
|---|---|---|
Gemini-Deep-Research |
Gemini deep research feature | |
Google-NotebookLM |
NotebookLM AI assistant |
Google crawlers
These are Google’s standard indexing crawlers. While not AI-specific, they are included because they feed Google’s search index and AI features (AI Overviews, SGE). Blocking Googlebot while allowing Google-Extended is a common misconfiguration — standard indexing has to work for AI features to include your content.
| User Agent | Company | Product |
|---|---|---|
Googlebot |
Googlebot Desktop | |
Googlebot-Mobile |
Googlebot Smartphone | |
Googlebot-Image |
Google Image crawler | |
Googlebot-Video |
Google Video crawler | |
Googlebot-News |
Google News crawler | |
Storebot-Google |
Google StoreBot Desktop | |
Storebot-Google-Mobile |
Google StoreBot Mobile | |
GoogleOther |
GoogleOther Desktop | |
GoogleOther-Mobile |
GoogleOther Mobile | |
GoogleOther-Image |
GoogleOther Image crawler | |
GoogleOther-Video |
GoogleOther Video crawler |
How to allow or block bots in robots.txt
Allow all AI bots (default if no rules exist): No entry needed — bots not mentioned in robots.txt are allowed by default.
Block only training crawlers, keep search agents:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
Block all AI bots (intentional opt-out):
User-agent: GPTBot
User-agent: OAI-SearchBot
User-agent: PerplexityBot
User-agent: ClaudeBot
User-agent: Claude-SearchBot
Disallow: /
A wildcard block affects every bot including search agents:
User-agent: *
Disallow: /
This blocks everything. Avoid unless you intend to block all crawlers completely.
For background on how the crawlability check works and what the mismatch finding means, see the Crawlability tab guide.