Sitemap:
A Sitemap is a file that
lists URLs for a site. The Sitemaps protocol
allows a webmaster to inform search engines about URLs on a website
that are available for crawling. It allows webmasters to
include more information about URL like when it was last updated and how important it is in relation to other URLs in the site. Site Files are
limited to 50,000 URLs and 10 megabytes per
map. Google first introduced
Sitemaps 0.84 in
June 2005. In November 2006 Google, MSN and Yahoo announced joint support for the Sitemaps protocol.
RSS:
RSS (Rich Site Summary) is a format that provides information about
regularly changing web content such as latest news headlines, blog entries,
audio and video. An RSS document includes summarized text, Meta data such as
publishing dates and authorship. It allows you to easily stay informed by
retrieving the latest content from the sites. RSS can be read using software
called an RSS reader, feed reader or aggregator. News aggregator or RSS reader
software allows you to capture RSS feeds from various sites and display them
for you to read and use.
Robots.txt File:
Robots.txt is a convention to prevent cooperating web crawlers and other web
robots from accessing all or part of a website
which is otherwise publicly viewable. Web site
owners use the robots.txt file to give instructions about their site to web robots.
It works like this: If a robot wants to
visits a Web site URL, say http://www.site.com. Then it firsts checks for http://www.site.com/robots.txt
and finds:
User-agent: *
Disallow: /
The "User-agent: *" means this section applies to all robots. The "Disallow: /"
tells the robot to
not visit the site pages.
There are two important considerations when using robots.txt:
• Robots can ignore your robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
• The robots.txt file is a file in the public domain. Anyone can see what sections of your server do not want robots to use.
There are two important considerations when using robots.txt:
• Robots can ignore your robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
• The robots.txt file is a file in the public domain. Anyone can see what sections of your server do not want robots to use.
Informative post!!!
ReplyDelete