[TOC] [Next]

Parsing XML with SAX

$sax = xml_parser_create();

SAX is an approach to parse XML documents, but not to validate them. The good thing is that you can use it with both PHP 4 and PHP 5 with no changes. In PHP 4, the SAX parsing is already available on all platforms, so no separate installation is necessary.

Parsing XML with SAX

<?php
  // ...
  $sax = xml_parser_create();
  xml_parser_set_option($sax, XML_OPTION_CASE_
    FOLDING, false);
  xml_parser_set_option($sax, XML_OPTION_SKIP_WHITE,
    true);
  xml_set_element_handler($sax, 'sax_start',
    'sax_end');
  xml_set_character_data_handler($sax, 'sax_cdata');
  xml_parse($sax, file_get_contents('quotes.xml'),
    true);
  xml_parser_free($sax);
?>

You create a SAX (Simple API for XML) parser using xml_parser_create(). This parser can look at an XML file and react upon various events. The following three events are the most important ones:

  • Beginning of an element

  • End of an element

  • CDATA blocks

You can then define handler functions for these elements and use them to transform the XML into something else, for instance Hypertext Markup Lan-guage (HTML). Listing on previous page shows this and outputs the contents of the XML file as a bulleted HTML list, as shown in figure. The function xml_set_element_handler() sets the handlers for the beginning and end of an element, whereas xml_set_character_data_handler() sets the handler for CDATA blocks. With xml_parser_set_option(), you can configure the handler, for instance to ignore whitespace and to handle tag names as case sensitive (then tag names are not converted into uppercase letters automatically). The following code contains the code for the handler functions:

function sax_start($sax, $tag, $attr) {
  if ($tag == 'quotes') {
    echo '<ul>';
  } elseif ($tag == 'quote') {
    echo '<li>' . htmlspecialchars($attr['year'])
      ': ';
  } elseif ($tag == 'coding') {
    echo '"';
  } elseif ($tag == 'author') {
    echo ' (';
  }
}
function sax_end($sax, $tag) {
  if ($tag == 'quotes') {
    echo '</ul>';
  } elseif ($tag == 'quote') {
    echo '</li>';
  } elseif ($tag == 'coding') {
    echo '"';
  } elseif ($tag == 'author') {
    echo ') ';
  }
}
function sax_cdata($sax, $data) {
  echo htmlspecialchars($data);
}

HTML created from XML.


PHP 5.1 comes with XMLReader included by default. This is a wrapper on libxml2 and mimics the application programming interface (API) of the C# component for reading XML, XmlTextReader. It is much faster than SAX and just easier to use. As of this writing, it is not very stable yet, but looks quite promising. Not part of the standard PHP 5.1 distribution, but available in PECL, is the associate module XMLWriter that allows write access to XML. More information about both modules is available in a presentation by their author at http://php5.bitflux.org/xmloncrack/.

[TOC] [Next]