How to use Robots.txt

Apr 21st, 2009 Blog Tools 0 Comment

Stumble How to use Robots.txt

Robots.txt file is a file placed in your main directory and issues commands to spiders visiting your site. The importance of a robots file can mean certain pages/sections can be “crawled” or not crawled depending on the issues given.

Using a Robots File Effectively

Generally we want as much as exposure as possible to our sites, but there some content that you don’t want indexed and listed on search engines. This is where a robots.txt can be used effectively.

Definitions

User-agent: this parameter defines, for which bots the next parameters will be valid. * is a wildcard which means all bots or Googlebot for Google.
Disallow: defines which folders or files will be excluded. None means nothing will be excluded, / means everything will be excluded or /folder name/ or /filename can be used to specify the values to excluded.
Allow: this parameter works just the opposite of Disallow. You can mention which content will be allowed to be crawled here. * is a wildcard.
Request-rate: defines pages/seconds to be crawled ratio. Example, 1/20 would be 1 page in every 20 second.
Crawl-delay: defines howmany seconds to wait after each succesful crawling.
Visit-time: you can define between which hours you want your pages to be crawled.
Sitemap: this is the parameter where you can show where your sitemap file is (You must use the complete URL address for the file).

Example

This the robots.txt I use on my site:


User-agent: *
Disallow: /cms/feed/
Disallow: */feed/*
Disallow: /feed
Disallow: /cms/wp-content/
Disallow: /cms/wp-plugins/
Disallow: */wp-content/*
Disallow: /cms/wp-content/plugins/
Disallow: /cms/index.php
Sitemap: http://www.bestblogs.asia/sitemap.xml

No related posts.

Leave a Reply to "How to use Robots.txt"

Have an Avatar? - Register one now!
Liked the post? - Subscribe via RSS.