Your robots.txt file controls which crawlers can access your website—including AI crawlers that feed ChatGPT, Perplexity, and other AI systems. Proper configuration is essential for AI visibility.
This guide covers all major AI crawlers, their purposes, and how to configure robots.txt for optimal GEO.
Major AI Crawlers
| Crawler | Company | Purpose |
|---|---|---|
| GPTBot | OpenAI | Training data and web browsing for ChatGPT |
| ChatGPT-User | OpenAI | Real-time browsing when users ask ChatGPT to search |
| PerplexityBot | Perplexity | Indexing for Perplexity's search-first AI |
| ClaudeBot | Anthropic | Training and knowledge for Claude |
| Google-Extended | AI training (Bard/Gemini), separate from search indexing | |
| CCBot | Common Crawl | Open dataset used by many AI training efforts |
Recommended Configuration for GEO
For maximum AI visibility, allow all major AI crawlers access to your content:
Important: If your robots.txt doesn't mention a crawler, most will assume they're allowed. However, explicit "Allow" rules are clearer and prevent issues if you later add restrictions.
Blocking AI Crawlers (Not Recommended for GEO)
Some sites choose to block AI crawlers for copyright or competitive reasons. If you need to block specific crawlers:
Warning: Blocking AI crawlers significantly reduces your AI visibility. ChatGPT and other systems will have less knowledge about your brand and be less likely to recommend you.
Selective Access
You can allow AI crawlers access to most content while blocking specific sections:
Verifying Your Configuration
Check Current robots.txt
View your current configuration at: yoursite.com/robots.txt
Test Crawler Access
Use Google's robots.txt Tester in Search Console or online tools to verify specific crawler rules.
Monitor Crawler Activity
Review server logs to confirm AI crawlers are accessing your site as expected.
Common Mistakes
Accidentally Blocking AI Crawlers
A broad "Disallow: /" for unknown crawlers can block AI systems. Be specific about which crawlers to restrict.
Inconsistent Rules
Different rules for different AI crawlers creates unpredictable visibility. Generally, treat all AI crawlers consistently.
Forgetting ChatGPT-User
GPTBot handles training data, but ChatGPT-User handles real-time browsing. Block one but not the other creates partial visibility issues.
Meta Robots Tags
In addition to robots.txt, you can use meta tags for page-level control:
Is Your Site AI-Accessible?
Our audit checks your technical configuration for AI crawler access and other GEO factors.
Get Free AI Visibility Audit →