Skip to content
Technical SEO

Robots.txt

A file that instructs search engine crawlers which pages or sections of a website they should not crawl.

Niraj Raut Niraj Raut 2 min read Technical SEO
Share: X / Twitter LinkedIn

💡 Think of it like this: Imagine Google is a postman who can only deliver to certain streets. Robots.txt determines which streets the postman is allowed to visit — and how often.

Quick Facts: Robots.txt
Category Technical SEO
Difficulty Level Beginner
Affects Crawlability, Indexing, Site Speed
Tools to Measure Screaming Frog, Google Search Console, Ahrefs
Related Terms Crawl Budget, Noindex Tag, Xml Sitemap

How Robots.txt Works

The robots.txt file is a plain text file placed in the root directory of a website that provides instructions to search engine crawlers about which pages, directories, or file types they are permitted or forbidden to crawl. Following the Robots Exclusion Protocol, the file uses “User-agent” directives to target specific crawlers and “Disallow” directives to specify paths that should not be accessed. Most major search engines, including Google and Bing, respect these instructions by default.

Why Robots.txt Matters for SEO

Common uses for robots.txt include blocking crawlers from accessing staging environments, admin areas, duplicate content, search result pages, and resource-heavy files that consume crawl budget without adding SEO value. However, a critical distinction must be understood: robots.txt blocks crawling but does not prevent indexing. A URL blocked in robots.txt can still be indexed if other sites link to it. To prevent indexing, a noindex meta tag is required. If you’re unsure how Robots.txt is impacting your site, working with an experienced SEO consultant can help you identify the problem and fix it efficiently.

Common Robots.txt Mistakes

Errors in robots.txt, such as accidentally blocking critical pages or the entire site, can cause catastrophic ranking losses. Regularly auditing the file and testing it with Google Search Console’s robots.txt tester prevents unintended crawl restrictions.

Do’s and Don’ts: Robots.txt

✅ Do This ❌ Don’t Do This
✅ Submit your sitemap.xml to Google Search Console and keep it clean ❌ Ignore crawl errors — they waste crawl budget on dead pages
✅ Set canonical tags on duplicate and near-duplicate pages ❌ Leave both HTTP and HTTPS versions accessible without redirects
✅ Test your robots.txt before deploying to prevent blocking key pages ❌ Block JavaScript or CSS files in robots.txt — it breaks Google’s rendering
✅ Monitor Core Web Vitals monthly and fix regressions quickly ❌ Ignore page speed issues — slow pages lose rankings and conversions

← Back to SEO Glossary

TL;DR: A file that instructs search engine crawlers which pages or sections of a website they…

If you remember one thing — focus on how Robots.txt affects your users first, then optimise for search engines second.

Frequently Asked Questions

A file that instructs search engine crawlers which pages or sections of a website they should not crawl.
Robots.txt directly influences how search engines understand and rank your pages. Websites that get this right tend to see stronger organic visibility, better crawl efficiency, and more consistent traffic growth over time.
Start by auditing your current setup using tools like Google Search Console, Screaming Frog, or Ahrefs. Identify the gaps, prioritise by impact, and apply fixes methodically. Working with an experienced SEO consultant can help you cut through complexity and see results faster.
Share this post X / Twitter LinkedIn
Niraj Raut
Niraj Raut
SEO Consultant & Strategist

SEO consultant helping service businesses in Nepal and beyond grow through organic search. I write about technical SEO, content strategy, and building durable search presence without the fluff.

View SEO Expert Profile
Back to SEO Glossary
Text on WhatsApp Get Quote