Robots.txt
A file that instructs search engine crawlers which pages or sections of a website they should not crawl.
💡 Think of it like this: Imagine Google is a postman who can only deliver to certain streets. Robots.txt determines which streets the postman is allowed to visit — and how often.
How Robots.txt Works
The robots.txt file is a plain text file placed in the root directory of a website that provides instructions to search engine crawlers about which pages, directories, or file types they are permitted or forbidden to crawl. Following the Robots Exclusion Protocol, the file uses “User-agent” directives to target specific crawlers and “Disallow” directives to specify paths that should not be accessed. Most major search engines, including Google and Bing, respect these instructions by default.
Why Robots.txt Matters for SEO
Common uses for robots.txt include blocking crawlers from accessing staging environments, admin areas, duplicate content, search result pages, and resource-heavy files that consume crawl budget without adding SEO value. However, a critical distinction must be understood: robots.txt blocks crawling but does not prevent indexing. A URL blocked in robots.txt can still be indexed if other sites link to it. To prevent indexing, a noindex meta tag is required. If you’re unsure how Robots.txt is impacting your site, working with an experienced SEO consultant can help you identify the problem and fix it efficiently.
Common Robots.txt Mistakes
Errors in robots.txt, such as accidentally blocking critical pages or the entire site, can cause catastrophic ranking losses. Regularly auditing the file and testing it with Google Search Console’s robots.txt tester prevents unintended crawl restrictions.
Do’s and Don’ts: Robots.txt
Related SEO Terms
TL;DR: A file that instructs search engine crawlers which pages or sections of a website they…
If you remember one thing — focus on how Robots.txt affects your users first, then optimise for search engines second.