Converting HTML to Other Formats : CGI and Perl

There are a number of other tools available for converting to and from a large number of formats not discussed here; again, these tools are available at http://www.w3.org/pub/WWW/Tools/.

Making Existing Archives Available via HTTP

As I mentioned early on in this chapter, the process of sharing documents has evolved with the Internet. Earlier tools and protocols, especially FTP, are still in wide use, especially for larger, archive file formats, such as tar.gz, or .zip files.

Serving the same archive from an HTTP server and FTP server simultaneously can give the unsavory individual an easy opening to break into your system. Read carefully the considerations and issues discussed in Chapter 3 before doing this. In particular, if you have an upload area for FTP clients, make absolutely sure that the HTTP server can't get to it, or at least can't read anything, and especially not execute anything as CGI or SSI in that directory.

Probably the most important aspect to consider, after you've made things secure, when serving your documents via a means other than HTTP, is the naming conventions you use for them. You need to keep the appropriate extension for MIME, of course, but FTP clients tend to rely more on filenames that are descriptive of content. Another nice thing to do is provide a simple text representation of the index.html in each directory in your hierarchy so that the FTP clients can retrieve a description that isn't marked up as HTML.

Many sites prefer not to make their documentation available via any means other than WWW, and this is certainly okay, but providing the means to obtain your documents via other protocols could certainly increase the rate of their distribution. If this is desirable, then when you set things up, you should consider the users of these other protocols and their limited capability to browse your archive.

Summary

We've covered a lot of ground in this chapter, and I hope I've given you, the Webmaster, a better feel for the many other duties you must perform and how to handle them using Perl. We've covered some of the most important issues arising out of configuration management. We've also covered some of the most common tasks and projects the Webmaster may have to perform.

Some important topics I've covered here have been:

What to plan for when starting out a new archive hierarchy.
Motivation for using revision control and some pointers to existing commercial and noncommercial implementations.
Techniques for parsing and summarizing various server log files.
Converting to/from HTML format and to/from other document formats.

Again, I stress that this chapter is not comprehensive regarding the additional duties that the Webmaster must perform or their solutions.