0

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

Filed under Blog by #

0

Filed under Blog by #

0

There is lots of bad information out there regarding SEO, and internet marketing.

Whether you are a savvy internet marketer or a rookie I am confident that you have heard of the robots.txt file.

You have probably heard myths, conflicting information, and advice on how to use to use it.

The robots.txt file was designed to inform bots how to behave on your site. What information they can get and what information they can’t. It’s a simple text file that is very easy to create, once you understand the proper format.

An important point to remember is to create your robots.txt file in Notepad or another text editor. DO NOT, under any circumstances, create your robots.txt file in an HTML Editor like DreamWeaver, GoLive or FrontPage. FTP clients usually convert the file into Unix, but there are occasions when it will fail. Do not take the chance, create it in Notepad instead.

User-agent
The User-agent line specifies the robot.
For example: User-agent: googlebot
You may also use the wildcard character ‘*’ to specify all robots. For example: User-agent: * You can find user agent names in your own logs by checking for requests to robots.txt. Most major search engines have names for their spiders.
Here is a partial list:

Googlebot MSN Robot Yahoo! Slurp (recently renamed) Google AdSense Robot Noxtrumbot Xenu Link Sleuth

The second part of a robots.txt file consists of Disallow: directive lines. Just because the Disallow statement is there, doesn’t mean that the bot(s) are completely disallowed on the site. These lines can specify files and/or directories. For example, if you want to instruct spiders to not download private.htm, you would enter: Disallow: private.htm

You can also specify directories: Disallow: /cgi-bin/ This will block spiders from your cgi-bin directory.
Common Questions about the Robots.txt File

Q: Why should I use it when I can use the meta-robots tag instead.
A: First of all, the meta-robots tag is not compliant to the needs of search engines.   All the major engines and most of the minor engines look for the robots.txt and do their best to obey it. This is not true with the meta-robots tag. Also, if you use the meta-robots tag, don’t use the "index,follow" parameter. That is what a search bot does by default. You don’t need to be told to do that and neither do the bots.

Q: Where do I place the robots.txt file?
A: The file should be placed in the root directory of your server. In other words, in the same place as your index.html file for your home page.

Q: What are some things that I would want to exclude from the robots?
A: Here are a few examples: * Any folder that is "off limits" to public eye that you have not (for whatever reason) password protected. * Print Friendly versions of pages (to avoid the duplicate content filter) * Images – to protect them and to avoid spidering problems * CGI-BIN (programming code) * Review your weblogs and find spiders that you don’t want to come to your site and deny them.

Q: I exclude bots from indexing my site in the robots.txt file, but they come and crawl anyway. What am I doing wrong?
A: Make sure you validate your robots.txt file. I prefer the one from Search Engine World. Another option would be that you have encountered an "evil" bot that wants to harvest either your content, or your email addresses for spam. "Evil Bots" are not going to obey the robots.txt file on purpose. Instead, you will need to use your HTAccess file (Apache Server) to do this.

Conclusion Creating your robots.txt file is not complicated and will take less than seven minutes if you follow these steps:
1. Copy and paste the robots.txt file from our blog to Notepad

2. FTP to your web server and write down the folders you want to exclude

3. Modify the Disallow lines in the robots.txt to reflect the folders you targeted

4. Save the file

5. Upload to your server

6. Validate the file

7. If you need to make changes, do so and then repeat steps 4-6 In about two weeks, you will begin to see improve spidering, a greater depth of indexing and maybe even a rise in your rankings.

Filed under Blog by #

Login