Reference

robots.txt

robots.txt is a plain-text file at the root of a site (example.com/robots.txt) that tells crawlers which paths they may or may not request. It controls crawling, not indexing — a blocked page can still appear in search if other pages link to it.

Web & SEOGeneral

robots.txt

Also known as: robots file, robots exclusion, crawler rules

robots.txt is a plain-text file at the root of a site (example.com/robots.txt) that tells crawlers which paths they may or may not request. It controls crawling, not indexing — a blocked page can still appear in search if other pages link to it.

  • Plain-text file at the site root: /robots.txt
  • Controls crawling, not indexing
  • Respected by major crawlers, but not a security tool

How robots.txt works

The file lives at the domain root and uses simple User-agent, Allow, and Disallow rules to tell each crawler which paths to skip. It is commonly used to keep bots out of admin areas, internal search results, and duplicate or low-value paths.

Crucially, robots.txt governs crawling, not indexing. Disallowing a URL stops compliant crawlers from fetching it, but if other sites link to that URL it can still show up in search results without a description. To keep a page out of the index, use a `noindex` meta tag instead — and do not block it in robots.txt, or the crawler will never see the noindex.

Common uses and risks

A robots.txt file usually also points to the Sitemap so crawlers can find the full URL list. Keep the rules deliberate: a single overly broad `Disallow: /` can accidentally hide an entire site from search.

Well-behaved search crawlers respect robots.txt, but it is a request, not a security control — malicious bots can ignore it, so never rely on it to protect private content.

Related terms

Keep reading the reference.

Act on it

Guides and tools for this topic.