Search Engine Optimization Tips and Tricks – Learn how to increase visibility in Search Engines – Part 23 – Understanding the robots.txt file

In the previous post (All about Crawler), there were a lot of details about what a search engine crawler is, what it looks for, and why you must welcome this crawler to your site. Well, let us move further in this regard, and understand what a robots.txt file is, and what it has to do with search engines and the crawler.

What is a robots.txt file ?

Well, a robots.txt file is actually not about those robots you read in science fiction, instead it deals with those script robots / bots that you read about in the previous post. As per the search engine crawler protocol, a robots.txt file is a file that you put on the directories of your site with instructions on how you want your site to be read by the bots (and not all the search engine crawlers obey it, but the major ones do). It is a very simple text file that you can create in Notepad (or any other text editor, so if you are a vi fanatic, then use that by all means). You can use this text file to even ask the bots to prevent certain sections of your site from being copied, which means that those sections of the site will not be visible in search engines (and why would you want this to happen ? Well, one example is when you use this set of instructions to prevent duplicate reading of your site, so for example, the you want all the archive section of your site to be read, but not other paths to the same parts of your site (to prevent the duplicate content penalty), then this is a good way to do this (and the All-In-One SEO Plugin does indeed do that)).

Some scenarios in which you want to use the robots.txt file

– It is a matter of good protocol to have a robots.txt file since search engines expect these files to be there (but this is not a necessary condition)
– If you are in the habit of building different doorway pages of a site, with customized versions of the doorway page for each search engine or platform (a different doorway page for the iPhone, for example), then you would want to customize the search engines entry into these doorways (for example, if the Google search bot found its way to all off these doorway pages, they might consider you to be trying to hoodwink users). Alert: Doorway pages are now under alerts by search engines and you should consider your policies and intent in this area carefully.
– if your page has many sections that are still under construction, then you would not want the search engines to crawl and show these not-ready sections of the site. In such cases, you can prevent the spiders from reading the site and showing the result in search engines
– There are some sections of your site that are meant for individual persons, and you are not really interested in making them available for wider reading. Other people can access these if they go through the whole structure of your site, but there is no reason to show the results in a search engine result.
– You have created a print friendly version of your articles, but really don’t want search engines to go there (to avoid the duplicate content penalty and also to prevent users from directly accessing the print friendly versions). So you can enforce this in the robots.txt file.
– Keep the robots away from such content as has no value for the bots, such as directories with flash content, with JavaScript, and so on.

