How To use a robots.txt search engine file For your website, the robots.txt file is a must. The search engine robots/web crawlers/spiders come in search of this file, and when the robot fails to find it, it's not a happy spider crawling your site. :-)
If you don't have it a 404 error will show up in your logs, as a page not found, a situation you can easily avoid.
Inviting the search engine robots into your site
To ALLOW ALL robots complete access simply copy & paste this code into notepad and save as http://www.yourdomain.com/robots.txt
User-agent: *
Disallow:
To EXCLUDE ALL robots from the entire server simply paste this code into notepad and save as http://www.yourdomain.com/robots.txt
User-agent: *
Disallow: /
NOTE: the robots.txt file Must be on the root level, as shown in the sample URLs above. Then you can go check it with a validator like one of these.
There is also a tag you can put into your <head> <meta name="robots" content=" index, follow"> Some question whether this tag has any value, but can't hurt you. There is also a noindex,nofollow or variations of the two.
INDEX
All robots are welcome to include this page in search services.
NOINDEX
This page may not be indexed by a search service.
FOLLOW
Robots are welcome to follow links from this page to find other pages.
NOFOLLOW
Robots are not to follow links from this page.