Understanding robots.txt for SEO

The robots.txt file is a plain text file placed at the root of your website (e.g., https://example.com/robots.txt) that instructs search engine crawlers which pages or sections of your site they should or shouldn't crawl. It's part of the Robots Exclusion Protocol, a voluntary standard that major crawlers including Googlebot, Bingbot, and others respect.

A critical misunderstanding many site owners have is that Disallow in robots.txt prevents pages from being indexed. It does not — it only prevents crawling. A page that's linked to from other pages can still appear in search results even if it's disallowed in robots.txt. To prevent indexing, you need the noindex meta tag or X-Robots-Tag header, which requires the page to be crawlable in the first place.

Common legitimate uses of robots.txt include blocking admin areas, preventing duplicate content from parameter-based URLs, reducing crawl budget waste on thin pages like login pages and internal search results, and blocking development or staging content from being indexed prematurely.

Robots.txt Best Practices

Always include your sitemap — Add a Sitemap: directive pointing to your XML sitemap. This helps crawlers discover all your important pages efficiently.
Don't block CSS/JS — Google needs to render your pages to assess them. Blocking CSS, JavaScript, or image resources prevents Google from seeing your pages as users do, which can harm rankings.
Crawl-delay is mostly ignored — Google ignores the Crawl-delay directive. Use Google Search Console to manage crawl rate instead.
Test before deploying — Always test your robots.txt changes against important URLs before publishing. Accidentally blocking your entire site (Disallow: /) is a common and devastating mistake.

🤖 Robots.txt Tester

🔍 Test a URL Against Your Rules

📋 Detected Sitemaps

Understanding robots.txt for SEO

Robots.txt Best Practices

Related Tools