Assuming you have extracted the list of words to search for, you'll simply write a function that accepts a word list as an argument, along with the file to scan. Let's leave it up to the File::Find
module to pass you the files, as shown in Listing 7.9.
Listing 7.9. Subroutine to search for a list of words.
sub wanted { # This line gets rid of all Unix-type hidden files/directories. return if $File::Find::name=~/\/\./; # Only look at HTML files. if ($File::Find::name=~/^.*\.html$/) { if (!open(IN, "< $File::Find::name")) { # This error message will appear in your error_log file. warn "Cannot open file: $File::Find::name...$!\n"; return; } my(@lines)=<IN>; close(IN); my($count)=0; foreach (@words) { # Make the search case-insensitive. $word="(?i)$_"; $count+=grep(/$word/,@lines); } if ($count>0) { # Add this page to the list of found items. push(@foundList,"$File::Find::name"); # Store the hit count in an associate array # with the page as the key. $hitCounts{"$File::Find::name"}=$count; } } }
Note:
If you are running on a UNIX system where the
egrep
command is available, you should consider replacing the majority of this Perl code with a call toegrep
, as follows:@hitList=`egrep -ci `(word1|word2|word3)' $File::Find::name`;
This would be more efficient in terms of memory requirements and processor use.
File::Find
contains a function called finddepth()
, which takes at least two arguments: a filter function and one or more directory names to recurse. The filter function you are using is the one above called wanted()
. finddepth()
calls wanted()
for each file that it comes across. The filename is contained in the variable $_
. The file path is contained in the variable $File::Find::dir
. You have used the variable $File::Find::name
, which is the combination of the other two variables, with a path separator stuck in between. By using the functionality provided by File::Find
, all you need to do is add in your search filter and not worry about recursion and figuring out what's a file and what's a directory.
The code used to initiate the search looks like this:
@words=split(/ /,$q->param(`SearchString')); if (@words>0) { finddepth(\&wanted,"/user/bdeng/Web/docs"); }
It's probably a good idea to check the @words
array so that it contains at least one value. No need to make finddepth()
do all that work if you have nothing to search for. In this particular case, you might emit some HTML that politely reminds the user to specify something to search for.