AgentsWeb Scanning, Mirroring, and Background Tasks
- Retrieving Specific Documents from the Web
- Generating Web Indexes
- Mirroring Remote Sites
This chapter focuses on agents that make use of the Web protocol to perform some automated tasks. Many Webmaster responsibilities, such as figuring out when links are stale, generating usage reports, generating search indexes and mirroring of sites are easily automated using Perl. In addition to these server-related background tasks, consider the usefulness of client-side automation, such as retrieving up-to-the-minute information including news headlines or stock quotes.
This chapter shows you how to leverage existing Perl modules to make these automated tasks even easier. These are just a few examples, but you can apply what you learn here toward some tasks specific to your needs.
Retrieving Specific Documents from the Web
Retrieving documents from the Web is what everyone does when they surf the Web. The Web browser provides a nice front-end navigation tool for this type of interactive retrieval. You can also retrieve documents in an automated way by using the HTTP protocol within a Perl script. The most common example of this is retrieving stock quotes. You can think of Web servers as the information providers and the user agents as the information retrievers. Suppose a Web server provides up-to-the-minute news, sports scores, stock quotes, and so on. You can write a fairly simple script in Perl to monitor these Web sites and provide you with that up-to-date information.
Stock Quotes on the Hour
Stock quotes are, of course, the most obvious application for retrieving information from the Web. Public Web pages are available from which you can get the latest stock prices at the click of a button. This example shows you how to write your own customized Perl script to tell you the current stock price every hour on the hour. You can simply feed it stock symbols, and it retrieves the information, parses it, and displays only what you are interested in.
One Web site that provides stock quotes is the Security APL Quote Server. The URL for obtaining quotes is
http://qs.secapl.com/cgi-bin/qso. After spending some time figuring out the format of the data coming back, it's easy to come up with regular expressions for extracting the price data, percent of fluctuation, date, and time. To specify a list of quotes to retrieve in the URL, append the string "?tick=symbol1+symbol2". This string contains the parameter list that is passed to the quote serving CGI script. This particular site allows you to specify up to five stock symbols at a time. The data coming back contains the stock quotes separated by a horizontal line tag, <HR>. Each quote begins and ends with the pre-formatted text tags, <PRE> and </PRE>. The HTML in Listing 9.1 is a sample returned by the quote server for two stock symbols. Figure 9.1 shows this page in Netscape.