What is Robots.txt?

The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website.

Cheat Sheet

Block all web crawlers from all content

User-agent: *
Disallow: /
Block a specific web crawler from a specific folder

User-agent: Googlebot
Disallow: /no-google/
Block a specific web crawler from a specific web page

User-agent: Googlebot
Disallow: /no-google/blocked-page.html
Sitemap Parameter

User-agent: *
Disallow:
Sitemap: http://www.example.com/none-standard-location/sitemap.xml
Optimal Format

Robots.txt needs to be placed in the top-level directory of a web server in order to be useful. Example: http://www.example.com/robots.txt