Using XML to Assist Search Engines

If you don't remember CompuServe, it was one of the earliest online communities and data service providers that offered discussion forums, news, and primitive versions of what we commonly associate with modern web portals such as Yahoo!. Although CompuServe certainly paled in comparison to the modern Web experience, it nonetheless provided a fairly interesting online community back before AOL or Yahoo! even existed.

I should know I was a CompuServe local for several years! The quote above is being a bit critical of Google by comparing its ability to perform semantic searches to CompuServe's primitive online service. Although this may be a valid argument, CompuServe was initially ahead of its time, and so is Google. Google has set the standard for searching the Web, and has numerous projects underway to help add context to Web searches and provide a more accurate means of mining the world's electronic information.

This tutorial explores a feature of Google's search engine called Google Sitemaps, which allows Web developers to automatically notify Google of changes in web page content so that Google can know that the pages exist, as well as indexing the pages more frequently. Of course, Google Sitemaps uses XML or I wouldn't be bothering to tell you about it. It represents a clever and efficient use of XML that serves as an important tool in any Web developer's toolkit.

In this tutorial, you'll learn

  • The basics of web crawling and why it's important
  • What Google Sitemaps is and how it can help your web site
  • About the XML-based Google Sitemaps protocol, as well as how to develop Sitemap documents for your own web pages
  • How to validate, submit, and automatically generate Google Sitemaps