CGI and Perl

Data Security

A Web archive may have a specific intent for any or all of its elements. The contents of your archive may be entirely intended for public access, without any constraints at all, or you may wish to have specific permissions set up for each and every element. In this section, we'll take a look at how you can do either.

General ConsiderationsGlobal versus Local Configuration of Permissions

Obviously, if you don't care who accesses your archive's contents, you won't need to worry too much about data security, with the exception of general permissions being correctly set. If your archive provides documents that are intended for a specific audience, such as registered users or paying customers, then you'll need to use the additional features of the httpd server to limit access to those documents to the audience you intend them for. Additionally, you may use CGI scripts, Netscape cookies, hidden variables, or other state-retention mechanisms to assure proper access. These topics will be discussed in Chapter 16, "Advanced CGI/HTML."

Most http servers provide you with the capability to give access to specific files and directories on an individual basis, using one of two methods. The files and tools you'll be configuring to enable this capability differ in format, depending on which server you're running, but generally are

conf/access.conf Global access configuration file
htaccess Directory-specific access file

when using the Apache or NCSA servers. In the examples that follow, we'll assume you're using one or the other of these servers. The other Web server, from CERN, won't be discussed or used in the examples here for the sake of brevity.

For simplicity and portability, it's probably easier to use the per-directory .htaccess files, but this technique is generally not favored as strongly as using the global access-control file, presumably because it's easier to control all rights and permissions from one place. If you do decide to use the per-directory method, then be aware that the .htaccess file can be specified in the conf/srm.conf file to be whatever name you specify, and it is recommended that you use a name other than .htaccess in order to avoid arbitrary retrievals of your access configuration files themselves. (There is a bug in some servers that allows the client to fetch your access-configuration files directly.)