AI crawler bots 🤖

An ever-growing list of major AI crawler bots and user agents that collect website data to either train large language models or provide more accurate responses to user searches in real-time.

AmazonBot

Crawler

Used by Amazon to crawl and index web content, enhancing services like Alexa by collecting publicly available data to improve search results and answer accuracy.

Applebot

Crawler

Used by Apple to crawl and index web content for powering features like Siri, Spotlight, and Safari search, and to collect data for training its generative AI models.

Bytespider

Crawler

Used by ByteDance to crawl and collect web content for training AI models powering TikTok and its ChatGPT competitor Doubao.

Source: no public documentation

CCbot

Crawler

Used by Common Crawl to systematically crawl and archive the open web, providing a free dataset widely used for AI training, academic research, and large-scale data analysis

ChatGPT-User

User agent

Used by OpenAI's ChatGPT to browse websites and fetch information when a user asks for something that requires real-time web data.

ClaudeBot

Crawler

Used by Anthropic to crawl and collect public web content for training its Claude AI models, enhancing their knowledge and performance.

Meta-ExternalAgent

Crawler

Used by Meta to crawl and index web content for training AI models and enhancing products by directly collecting publicly available data.

OAI-SearchBot

User agent

Used to browse websites and retrieve real-time information when users request data that requires up-to-date web content.

PerplexityBot

User agent / Crawler

Used to either conduct periodic search indexing and check websites when users ask something in Perplexity that require real-time web data.

Perplexity-User

User agent

Used to browse websites and fetch information when a user asks for something that requires real-time web data.