Contexta Digital
  • How Can We Help?
    • SEO & SEM
    • SEO Training
    • Web Design
  • Upcoming Events
  • Blog
  • Videos
  • Contact Us
0

HOWTO: Create and Use a Robots.txt File

3 March, 2010 - HOWTO, Uncategorized

The robots.txt file is a mysterious little document for the uninitiated.  The purpose of the robots.txt file is to provide a broad set of instructions for the search engine spiders when they arrive at your site.  In general, the file is used to tell the spiders which parts of your site they should not be indexing (visiting).

It’s worth noting that the search engine robots don’t have to obey the robots.txt file, so it’s a guideline at best.  The other important factor is that the file has to be public in order for the robots to crawl it, so don’t try to hide anything in the file.

Creating your robots.txt file

If you are using a CMS, there are likely to be plugins that you can use to automatically create a robots.txt file and add it to your site.

Regardless, it is still important to understand what is going on under the hood, so that you can modify the file if you need to.

To get started open up your favourite text editor, like Notepad.  Don’t use a word processor!  This will make life unnecessarily complicated.

The main syntax that you will use is:

User-agent:
Disallow

The “User-agent” field specifies the search robot, and the “Disallow” field specifies the part of the site you want the robots to leave alone.

Here are some examples

To disallow all robots from the entire server

User-agent: *
Disallow: /

To exclude all robots from a part of your server:

User-agent: *
Disallow: /cgi-bin/
Disallow:/admin/

This will instruct the robots to ignore anything in the yoursite.com/cgi-bin/ and yoursite.com/admin/ folders.

On this site, for example, I have the following:

User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php

This instructs the robots to ignore all of the wordpress related admin folders of my site.

To disallow a particular robot:

User-agent: SpamBot
Disallow: /

To allow a single robot:

User-agent: Googlebot
Disallow: 

User-agent: *
Disallow: /

The other useful thing you can do is specify the location of your xml  sitemap:

Sitemap: http://yoursite.com/sitemap.xml

What about Blocking Only the Bad Robots?

It would be nice to be able to block spam bots from your site altogether.  In principle this can be done, as long as you know the name of the bot, and it obeys the robots.txt file, then you can name it according to the formatting above.  In practice, most  bad robots ignore the robots.txt file altogether so trying to block them doesn’t work.

Try our template!

If writing out a long text file scares you, or you just don’t want to bother, you can download a template from our Facebook fan page.  Just click on the link in the bar at the bottom of the page to become a fan.  You will find the robots.txt template and other great  resources under the “SEO Resources” Tab.

Got questions?  Comment away!

Similar Posts:

    None Found

No comments yet.

Add a comment

Top
Click here to cancel reply.
(it will not be shared)

Archives

  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • November 2010
  • August 2010
  • July 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010

Recent Comments

  • Michael Lautman on How Not to Suck at Social Media: Follow Up
  • Anonymous on How Not to Suck at Social Media: Follow Up
  • Michael Lautman on How Not to Suck at Social Media: Follow Up
  • Jesse on How Not to Suck at Social Media: Follow Up
  • Anonymous on How Not to Suck at Social Media: Follow Up

Pages

  • How Can We Help?
  • Web Design
  • SEO & SEM
  • Sitemap
  • Contact Us
  • Videos
  • West Island SEO Training
  • West Island SEO Services
  • Contact Us

Popular Posts

  • Blog Post Ideas: Brainstorming for Content Marketing
  • Are You The Next Blockbuster Video?
  • An Interview with Marc Thompson of Genumark
  • The Art of AdWords - June 2011

Contexta Digital

105 Taywood Drive
Beaconsfield, QC
H9W 1A7
514.436.0672