Monday, June 23, 2008

Google, Microsoft and Yahoo! Unites.

The Big 3 Search Engines Unites as They Agreed to Support Standard Sitemap Protocol.

It's been a couple of years since these three major players in the internet industry talked about the possible team-up regarding sitemaps which was introduced by Google in 2005. Now is the time that these three rivals crossed each others paths for the second time as they agreed to support a standard protocol for sitemaps - a system for processing submitted web pages for crawling.

With this idea, websites won't need to make some code adjustments and changes for each of the search engines.


How is this possible?

Heard about the Robots Inclusion Protocol or REP?

Robots Exclusion Protocol (REP) is a code that commands the web spiders and other web robots in accessing some parts of a website, follow links and gather data for search engine indexing which is supported by all major search engines like Google, MSN, and Yahoo that has been existing for years.


Common REP Directives

Here's a list of the common REP Directives that are currently implemented by Google, MSN, and Yahoo.

1.Robots.txt Directives

Disallow
IMPACT: Tells a crawler not to index your site.
USE: This directive prevents specific path/s of a site from being crawled.

Allow
IMPACT: Tells a crawler which specific pages on your site you want to get indexed.
USE: It can be used in conjunction with Disallow clauses, where a large section of a site is disallowed except for a small section within it.

Sitemaps Location
IMPACT: Tells a crawler where to find your Sitemaps.
USE: Point to other locations where feeds exist to help crawlers find URLs on a site.

$ Wildcard Support
IMPACT: Tells a crawler to match everything from the end of a URL - large number of directories without specifying specific pages.
USE: Search files with specific patterns, for eg., files with certain filetypes that always have a certain extension, say pdf; etc.


2. HTML META Directives

NOINDEX META Tag
IMPACT: Tells a crawler not to index a given page
USE: This allows pages that are crawled to be kept out of the index.

NOFOLLOW META Tag
IMPACT: Tells a crawler not to follow a link to other content on a given page
USE: It lets the robot know that you are discounting all outgoing links from this page to prevent
spamming.

NOSNIPPET META Tag
IMPACT: Tells a crawler not to display snippets in the search results for a given page
USE: Present no snippet for the page on Search Results

NOARCHIVE META Tag
IMPACT: Tells a search engine not to show a "cached" link for a given page
USE: Do not make available to users a copy of the page from the Search Engine cache

NOODP META Tag
IMPACT: Tells a crawler not to use a title and snippet from the Open Directory Project for a given page.
USE: Do not use the ODP (Open Directory Project) title and abstract for this page in Search.




***
sources:
Google Webmaster Central Blog
Yahoo! Search Blog
Live Search



No comments:

SEO Blogs - BlogCatalog Blog Directory