CGI and Perl

Listing 9.10. Modified crawlIt() function for mirroring a site.

sub crawlIt {
    my($ua,$urlStr,$urlLog,$visitedAlready)[email protected]_;
    $request = new HTTP::Request `GET', $urlStr;
    $response = $ua->request($request);
    if ($response->is_success) {
       my($html) = parse_html($urlData);
       foreach (@{$html->extract_links(qw(a img))}) {
          ($link,$linkelement)[email protected]$;
          if ($linkelement->tag() eq `a') {
             if ($url ne "") {
                if (eval "grep(/$escapedURL/,\@\$visitedAlready)" == 0) {
          } elsif ($linkelement->tag() eq `img') {
             if ($url ne "") {
 sub searchForTitle {
    my($node,$startflag,$depth)[email protected]_;
    if ($lwr_tag eq `title') {
       foreach (@{$node->content()}) {
          $title .= $_;
       return 0;
    return 1;
 sub mirrorFile {
    my($ua,$urlStr)[email protected]_;
    my($url)=new URI::URL $urlStr;
    $localpath .= $url->path();

This example of mirroring remote sites might be useful for simple sites with only HTML files. If you have the need for a more sophisticated remote mirroring system, it would be best to use a UNIX-based replication tool like rdist for your site. If you are running a Windows NT server, there are replication tools available for these systems as well.