Rules will appear here as you type...

๐Ÿ” Test a URL Against Your Rules
Enter just the path, e.g. /blog/post-name or /admin/
All (*) Googlebot Bingbot Twitterbot

๐Ÿ“‹ Detected Sitemaps
No sitemaps detected yet.

Understanding robots.txt for SEO

The robots.txt file is a plain text file placed at the root of your website (e.g., https://example.com/robots.txt) that instructs search engine crawlers which pages or sections of your site they should or shouldn't crawl. It's part of the Robots Exclusion Protocol, a voluntary standard that major crawlers including Googlebot, Bingbot, and others respect.

A critical misunderstanding many site owners have is that Disallow in robots.txt prevents pages from being indexed. It does not โ€” it only prevents crawling. A page that's linked to from other pages can still appear in search results even if it's disallowed in robots.txt. To prevent indexing, you need the noindex meta tag or X-Robots-Tag header, which requires the page to be crawlable in the first place.

Common legitimate uses of robots.txt include blocking admin areas, preventing duplicate content from parameter-based URLs, reducing crawl budget waste on thin pages like login pages and internal search results, and blocking development or staging content from being indexed prematurely.

Robots.txt Best Practices

  • Always include your sitemap โ€” Add a Sitemap: directive pointing to your XML sitemap. This helps crawlers discover all your important pages efficiently.
  • Don't block CSS/JS โ€” Google needs to render your pages to assess them. Blocking CSS, JavaScript, or image resources prevents Google from seeing your pages as users do, which can harm rankings.
  • Crawl-delay is mostly ignored โ€” Google ignores the Crawl-delay directive. Use Google Search Console to manage crawl rate instead.
  • Test before deploying โ€” Always test your robots.txt changes against important URLs before publishing. Accidentally blocking your entire site (Disallow: /) is a common and devastating mistake.

Related Tools