AI Crawler Visibility & llms.txt Audit
Can AI crawlers actually find and understand your site? We simulate real bot requests from GPTBot, ClaudeBot, PerplexityBot, and Google-Extended to verify crawler access, detect cloaking, analyze machine-readable signals, and assess llms.txt quality when present.
What We Test
- › llms.txt file (emerging convention) — presence, HTTP status, markdown structure, broken links, and content quality when the file exists
- › AI bot simulation — real HTTP requests with GPTBot, ClaudeBot, ChatGPT-User, OAI-SearchBot, PerplexityBot, Google-Extended user agents
- › robots.txt parsing — per-bot Disallow/Allow rule evaluation for training and search-tier crawlers
- › Cloaking detection — compare browser vs bot responses for status, content-type, title, body differences
- › Sitemap accessibility — detect sitemap.xml location, check robots.txt blocking per bot
- › Schema.org structured data — JSON-LD validation, Organization/WebSite/FAQPage schemas, template placeholder detection
- › Semantic HTML structure — heading hierarchy (H1/H2/H3), main landmark, FAQ sections
- › Freshness signals — dateModified, article:published_time, Last-Modified header
- › AI interpretation match — AI model reads llms.txt and compares understanding with your meta description
Why This Matters for SEO & AI Discovery
AI assistants are becoming a meaningful discovery channel. If your site blocks GPTBot or serves different content to bots, you become harder to surface in AI-generated answers. llms.txt can help by giving models cleaner guidance, but it is an emerging convention rather than a mandatory web standard. We focus first on crawler access and machine-readable quality, then evaluate llms.txt as a supporting signal when available.
AI-Powered Analysis & Automated Fixes
When we detect AI visibility issues, we provide actionable fixes with context. For blocked bots, we identify the exact rule or status pattern and suggest allowlist changes. For cloaking detection, we flag differences between browser and bot responses. For weak machine-readable signals, we suggest schema and structure fixes. If llms.txt is missing, we can generate a starter template, but we do not treat that file as the only path to good AI visibility.
How We Grade
Bots accessible, machine-readable signals strong, no cloaking; llms.txt optional but useful
Mixed crawler access or weak machine-readable signals needing cleanup
Multiple AI bots blocked or cloaking detected
Free audit gives check scores and grade. Unlock full details, fixes, and deep AI analysis for $9 per report.
Check Your Site Now →Explore Other Checks
HSTS, CSP, DNSSEC, and mixed content detection
Mobile viewport rendering and touch target validation
OG tags and social card validation across platforms
Console error detection with severity classification
PageSpeed/Lighthouse lab metrics with separate CrUX data when available