HOWTO: Create and Use a Robots.txt File
The robots.txt file is a mysterious little document for the uninitiated. The purpose of the robots.txt file is to provide a broad set of instructions for the search engine spiders when they arrive at your site. In general, the file is used to tell the spiders which parts of your site they should not be indexing (visiting).
It’s worth noting that the search engine robots don’t have to obey the robots.txt file, so it’s a guideline at best. The other important factor is that the file has to be public in order for the robots to crawl it, so don’t try to hide anything in the file.
Creating your robots.txt file
If you are using a CMS, there are likely to be plugins that you can use to automatically create a robots.txt file and add it to your site.
Regardless, it is still important to understand what is going on under the hood, so that you can modify the file if you need to.
To get started open up your favourite text editor, like Notepad. Don’t use a word processor! This will make life unnecessarily complicated.
The main syntax that you will use is:
User-agent:
Disallow
The “User-agent” field specifies the search robot, and the “Disallow” field specifies the part of the site you want the robots to leave alone.
Here are some examples
To disallow all robots from the entire server
User-agent: *
Disallow: /
To exclude all robots from a part of your server:
User-agent: *
Disallow: /cgi-bin/
Disallow:/admin/
This will instruct the robots to ignore anything in the yoursite.com/cgi-bin/ and yoursite.com/admin/ folders.
On this site, for example, I have the following:
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/plugins/ Disallow: /wp-content/cache/ Disallow: /wp-content/themes/ Disallow: /wp-login.php Disallow: /wp-register.php
This instructs the robots to ignore all of the wordpress related admin folders of my site.
To disallow a particular robot:
User-agent: SpamBot Disallow: /
To allow a single robot:
User-agent: Googlebot Disallow: User-agent: * Disallow: /
The other useful thing you can do is specify the location of your xml sitemap:
Sitemap: http://yoursite.com/sitemap.xml
What about Blocking Only the Bad Robots?
It would be nice to be able to block spam bots from your site altogether. In principle this can be done, as long as you know the name of the bot, and it obeys the robots.txt file, then you can name it according to the formatting above. In practice, most bad robots ignore the robots.txt file altogether so trying to block them doesn’t work.
Try our template!
If writing out a long text file scares you, or you just don’t want to bother, you can download a template from our Facebook fan page. Just click on the link in the bar at the bottom of the page to become a fan. You will find the robots.txt template and other great resources under the “SEO Resources” Tab.
Got questions? Comment away!
Similar Posts:
- None Found
