robots.txt
Also known as: robots file, robots exclusion, crawler rules
robots.txt is a plain-text file at the root of a site (example.com/robots.txt) that tells crawlers which paths they may or may not request. It controls crawling, not indexing — a blocked page can still appear in search if other pages link to it.
- Plain-text file at the site root: /robots.txt
- Controls crawling, not indexing
- Respected by major crawlers, but not a security tool
How robots.txt works
The file lives at the domain root and uses simple User-agent, Allow, and Disallow rules to tell each crawler which paths to skip. It is commonly used to keep bots out of admin areas, internal search results, and duplicate or low-value paths.
Crucially, robots.txt governs crawling, not indexing. Disallowing a URL stops compliant crawlers from fetching it, but if other sites link to that URL it can still show up in search results without a description. To keep a page out of the index, use a `noindex` meta tag instead — and do not block it in robots.txt, or the crawler will never see the noindex.
Common uses and risks
A robots.txt file usually also points to the Sitemap so crawlers can find the full URL list. Keep the rules deliberate: a single overly broad `Disallow: /` can accidentally hide an entire site from search.
Well-behaved search crawlers respect robots.txt, but it is a request, not a security control — malicious bots can ignore it, so never rely on it to protect private content.