AI crawler bots 🤖
An ever-growing list of major AI crawler bots and user agents that collect website data to either train large language models or provide more accurate responses to user searches in real-time.
AmazonBot
Crawler
Used by Amazon to crawl and index web content, enhancing services like Alexa by collecting publicly available data to improve search results and answer accuracy.
Applebot
Crawler
Used by Apple to crawl and index web content for powering features like Siri, Spotlight, and Safari search, and to collect data for training its generative AI models.
Bytespider
Crawler
Used by ByteDance to crawl and collect web content for training AI models powering TikTok and its ChatGPT competitor Doubao.
Source: no public documentation
CCbot
Crawler
Used by Common Crawl to systematically crawl and archive the open web, providing a free dataset widely used for AI training, academic research, and large-scale data analysis
Source: https://commoncrawl.org/ccbot
ChatGPT-User
User agent
Used by OpenAI's ChatGPT to browse websites and fetch information when a user asks for something that requires real-time web data.
ClaudeBot
Crawler
Used by Anthropic to crawl and collect public web content for training its Claude AI models, enhancing their knowledge and performance.
Meta-ExternalAgent
Crawler
Used by Meta to crawl and index web content for training AI models and enhancing products by directly collecting publicly available data.
OAI-SearchBot
User agent
Used to browse websites and retrieve real-time information when users request data that requires up-to-date web content.
PerplexityBot
User agent / Crawler
Used to either conduct periodic search indexing and check websites when users ask something in Perplexity that require real-time web data.
Perplexity-User
User agent
Used to browse websites and fetch information when a user asks for something that requires real-time web data.