CGI and Perl

Displaying the Results

All you need to do now is display the results in a meaningful format. What you're aiming for is an ordered list of likely candidates for what the user is trying to find. You have an array of pages and an associative array of hit counts. What you need first is a sort routine to rearrange the array in the correct order. The following sort routine should work just fine:

@foundList = sort sortByHitCount @foundList;
 sub sortByHitCount {
 return $hitCounts{$b}- $hitCounts{$a};
 }

The first line in this code is the call to sort(), using the subroutine sortByHitCount(). The $a and $b variables are package global variables that sort() uses to tell the sorting routine which items to compare. The items that you're comparing in this case are filenames that are keys into the hitCounts associative array. Returning a negative value indicates that $a is less than $b, and returning a positive value indicates $a is greater than $b. Returning 0 indicates that the two values are equal. What you are actually comparing in sortByHitCount() is the hit count of each page.

Remember that in the previous example, the %hitCounts associate array must be within the scope of the sortByHitCount function. It would be a very difficult problem to debug if you decided to move the sortByHitCount into a different package scope one day.

Now you have a sorted list of files that need converting to URLs. To do this, you simply chop off the first n characters, where n is the length of the $serverRoot variable. This can be done with the following line:

$url=substr($file,length($serverRoot));

You can now format the string as a link by adding the <A> tag around the $url. The final main code appears in Listing 7.10.

Listing 7.10. A simple CGI searching program.

#!/public/bin/perl5
 use CGI::Form;
 use File::Find;
 # Variables for storing the search criteria/results.
 @words;
 @foundList;
 %hitCounts;
 $q = new CGI::Form;
 $serverRoot="/user/bdeng/Web/docs";
 if ($q->cgi->var(`REQUEST_METHOD') eq `GET') {
    &searchForm($q);
 } else {
    @words=split(/ /,$q->param(`SearchString'));
    print $q->header;
    print $q->start_html("Search Results");
    print "<H1>Search Results</H1>\n<HR>\n";
    if (@words>0) {
       finddepth(\&wanted,$serverRoot);
       @foundList = sort sortByHitCount @foundList;
       if (@foundList>0) {
          foreach $file (@foundList) {
             $item=substr($file,length($serverRoot));
             print "<A HREF=$item>$item</A> has ";
             print "$hitCounts{$file} occurences.<BR>\n";
          }
       } else {
          print "<P>Sorry, I didn't find anything based on your criteria. ";
       }
    } else {
        &searchForm($q);
        print "<HR><P>Please enter a search criteria. ";
    }
    print $q->end_html();
 }

This example is provided simply to show you the capability of Perl for text processing. If you have a very large Web site with a lot of files to search through, it would make much more sense for you to run an index generating on your data perhaps on a nightly basis and then use that index from your CGI script. The script in Listing 7.10 can easily be modified to search an index rather than your entire Web site. A good indexing package called Isearch can be found at http://cnidr.org/isearch.html.

Review

This example is pretty basic. You can certainly take this and extend it to suit your needs. One important concept in this example is that you should utilize existing libraries wherever possible. Some things, such as case-sensitivity, scope limitation, and filename filters, can be made optional by adding to the search form. This example was limited to a case-insensitive search on all HTML files within the root directory tree of the server. You can also consider extracting the titles of the Web pages that you search by scanning for the <TITLE> tag, because you've already read the entire file into an array. This can be stored in another associative array and displayed in the results page as the label of your link.

Again, it might be wise to look into existing indexing programs for a more efficient searching capability. This is especially true if you are managing a large site with a lot of large HTML files. You might also have other types of files in your site, such as PDF files for which you can also create indexes providing optimized searches.