All posts

Top AI search crawlers + user agents

A list of major AI crawler bots and user agents that collect website data.

Published

Jul 9, 2025

Author

Paul

Your website gets more than just human visitors these days. If you check your server logs, you'll see strange bot names crawling your pages. These aren't normal search bots—they're AI bots, and there are a lot of them.

Some collect content to train AI models. Others gather data to answer search questions in real time. Either way, they read your content - it's up to you to decide if it's a good thing or not.

OpenAI

`ChatGPT-User`

A user agent to browse websites and fetch information when a user asks for something that requires real-time web data in ChatGPT.

More info: https://platform.openai.com/docs/bots

`OAI-SearchBot`

A user agent to browse websites and retrieve real-time information when users select "live search" in ChatGPT to get up-to-date web content.

More info: https://platform.openai.com/docs/bots

`GPT-bot`

A crawler to browse websites to collect data which is then used to improve the training of its AI models.

More info: https://platform.openai.com/docs/bots

Anthropic

`ClaudeBot`

A crawler to browse public websites to gather content for training its AI language models.

More info: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler

`Claude-User`

A user agent to visit websites when Claude users ask questions that requires real-time information.

More info: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler

`Claude-SearchBot`

A crawler to browse the web to enhance the quality of search results for users. It's unclear how it is used against Claude-User

More info: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler

Amazon

`AmazonBot`

A crawler used by Amazon to crawl and index web content. The data it gathers enhances services like Alexa, improving search results and the accuracy of spoken responses.

More info: https://developer.amazon.com/amazonbot

Apple

`Applebot`

A crawler that indexes web content for features like Siri, Spotlight, and Safari search. It also collects data to help train Apple's generative AI models.

More info: https://support.apple.com/en-us/119829

TikTok

`Bytespider`

A crawler that collects web content for AI model training, including for Doubao, their ChatGPT-style assistant.

No official public documentation available.

Common Crawl

`CCbot`

A crawler that systematically archives the open web. Its massive dataset is publicly available and widely used for AI training, academic research, and data analysis.

More info: https://commoncrawl.org/ccbot

Perplexity AI

`PerplexityBot`

A crawler that indexes pages so Perplexity can surface and link to them in its answer citations. According to the company, it is not used to train foundation models.

More info: https://docs.perplexity.ai/guides/bots

`Perplexity-User`

A user agent that fetches individual pages on‑demand when a Perplexity user’s query requires direct access.

More info: https://docs.perplexity.ai/guides/bots

Google

`Google-Extended`

A crawler that controls whether Bard, Gemini, and other Google generative‑AI products may use your content.

More info: https://support.google.com/webmasters/answer/2723646#google-extended

How to manage these Bots in `robots.txt`

Allow all bots

Block a bot completely
User-agent: <bot>
Allow only specific folders
User-agent: <bot>

Remember that user‑initiated agents such as Claude-User and Perplexity-User may ignore robots.txt; use rate limiting or IP blocking if needed.

Top AI search crawlers + user agents

Top AI search crawlers + user agents

OpenAI

`ChatGPT-User`

`OAI-SearchBot`

`GPT-bot`

Anthropic

`ClaudeBot`

`Claude-User`

`Claude-SearchBot`

Amazon

`AmazonBot`

Apple

`Applebot`

TikTok

`Bytespider`

Common Crawl

`CCbot`

Perplexity AI

`PerplexityBot`

`Perplexity-User`

Meta

`Meta-ExternalAgent`

Google

`Google-Extended`

How to manage these Bots in `robots.txt`

Top AI search crawlers + user agents

Top AI search crawlers + user agents

OpenAI

ChatGPT-User

OAI-SearchBot

GPT-bot

Anthropic

ClaudeBot

Claude-User

Claude-SearchBot

Amazon

AmazonBot

Apple

Applebot

TikTok

Bytespider

Common Crawl

CCbot

Perplexity AI

PerplexityBot

Perplexity-User

Meta

Meta-ExternalAgent

Google

Google-Extended

How to manage these Bots in robots.txt

`ChatGPT-User`

`OAI-SearchBot`

`GPT-bot`

`ClaudeBot`

`Claude-User`

`Claude-SearchBot`

`AmazonBot`

`Applebot`

`Bytespider`

`CCbot`

`PerplexityBot`

`Perplexity-User`

`Meta-ExternalAgent`

`Google-Extended`

How to manage these Bots in `robots.txt`