XML

Using SAX with Java

The sample program in this chapter is written in Java and uses the Xerces SAX parser, which I mentioned earlier. If you're a Java programmer, I'm sure you're perfectly happy with this state of affairs. If you have no interest in Java, much of the remainder of this lesson probably won't be to your liking. However, the purpose of this chapter is to explain how SAX works, and while there are SAX parsers available for many languages, it started out in the Java world. And even if you have no interest in digesting the upcoming Java code, you can still experiment with the sample Java program, running it on your XML documents and analyzing the results. The syntax for this program is relatively simple, and I've commented the code to make it as clear as possible.

Even if you don't care about Java programming, you may still want to see the output of the sample program on your own computer. To run the program, you'll need Sun's Java Development Kit (JDK) and the Xerces library mentioned previously. I already explained how to download and install Xerces; to get the JDK, just go to http://java.sun.com/j2se/.

You'll need to download the J2SE (Java 2 Standard Edition) SDK and install it. Once it's installed, you can run the sample program. Just put the sample program's .java source code file in the directory where you put xercesImpl.jar and xml-apis.jar (you can put it anywhere you like, but this route is probably easiest), open a command prompt in that directory, and type the following:

javac -classpath xercesImpl.jar;xml-apis.jar;. DocumentPrinter.java

Alternatively, you can copy the xercesImpl.jar and xml-apis.jar files to the same location as the sample program and then compile and run the program from there. The main point is that the program needs to be able to access the .JAR files.

If your copy of the code for DocumentPrinter.java is correct and xercesImpl.jar and xml-apis.jar are really in the current folder, the DocumentPrinter class will be compiled and a file called DocumentPrinter.class will result. To run the program, use the following command:

java -classpath xercesImpl.jar;xml-apis.jar;. DocumentPrinter file.xml

You should replace file.xml with the name of the XML file that you want to process. As an example, here's how you would initiate the Document Printer sample program using the vehicles XML file from Transforming XML with XSLT, "Transforming XML with XSLT":

java -classpath xercesImpl.jar;xml-apis.jar;. DocumentPrinter vehicles.xml

Listing 17.1 contains a partial listing of the resulting output of running the DocumentPrinter SAX sample program on the vehicles.xml document.

Listing 17.1. The Document Printer Sample Program Uses a SAX Parser to Display Detailed Information About the vehicles.xml Document
 1: Start document.
 2: Received processing instruction:
 3: Target: xml-stylesheet
 4: Data: href="vehicles.xsl" type="text/xsl"
 5: Start element: vehicles
 6: Start element: vehicle
 7: Start element: mileage
 8: Received characters: 13495
 9: End of element: mileage
10: Start element: color
11: Received characters: green
12: End of element: color
13: Start element: price
14: Received characters: 33900
15: End of element: price
16: End of element: vehicle
17: ...
18: Start element: vehicle
19: Start element: mileage
20: Received characters: 48405
21: End of element: mileage
22: Start element: color
23: Received characters: gold
24: End of element: color
25: Start element: price
26: Received characters: 22995
27: End of element: price
28: End of element: vehicle
29: End of element: vehicles
30: End of document reached.

Just to refresh your memory, following is a brief code excerpt from the vehicles.xml document:

<vehicle year="2004" make="Acura" model="3.2TL">
  <mileage>13495</mileage>
  <color>green</color>
  <price>33900</price>
</vehicle>

In fact, this piece of code is for the first vehicle in the document, which matches up with the code on lines 6 through 16 in Listing 17.1. If you carefully compare the XML code with the listing, you'll notice how the program parsed and output information about every element in the document. This is the kind of detailed control you have at your disposal when using a tool such as a SAX parser.