[Previous] [Contents] [Next]


XML as a Hierarchy/Tree


A tree-based processor translates the XML document into an internal tree structure and allows an application to navigate that tree.

In the case of your musicians.xml file, a possible resulting tree structure is shown in Figure 14.4.

Trees can be more or less extensive depending on whether attributes, entities, and so on need to be stored as separate nodes.

Once the tree is constructed in memory, it can be navigated. Please note that you have two passes:

Pass 1: Parsing and tree-building
Pass 2: The data processing itself

In this way, it becomes possible to answer your questions that require a look ahead:

Is an element the last child element of its parent element?
Does this element have an element below it that has an attribute with the name experience and the value firsttime?

Because you have access to the full document (the complete tree), you have access to all information required.
Figure 14.4 A tree starting from musicians.xml.

The advantage of this approach is that it gives access to the whole document, so it's easy to look ahead. The disadvantages are that it's more difficult to first build a tree and then navigate it, it requires more memory, and it's slower, requiring two passes.

The World Wide Web consortium (W3C) developed a standard tree-based API for XML and HTML. It is called the Document Object Model (DOM) and is a W3C recommendation as of October 1, 1998. The specification can be found at http://www.w3.org/TR/ REC-DOM-Level-1/.

The DOM will be implemented in version 5 of both Internet Explorer and Mozilla (Netscape). It's covered extensively on Day 16, "Programming with the Document Object Model."

There is also software that brings you both worlds, mixing the event-driven approach and the tree-based approach. Balise, from the company AIS, is well known for this in the SGML community and includes a non-validating XML parser in its latest version. Unfortunately, no free version is available.

[Previous] [Contents] [Next]