Practical XML Querying with XQuery and Saxon

I promised earlier that I would pull together everything you've learned about XQuery and show you a couple of practical examples. The remainder of the lesson focuses on a couple of sample queries that operate on the same XML data. This data is stored in the familiar training log XML document that you've seen in earlier lessons. A partial listing of this document is shown in Listing 18.2.

Listing 18.2. A Partial Listing of the Training Log XML Document
 1: <?xml version="1.0"?>
 2: <!DOCTYPE trainlog SYSTEM "etml.dtd">
 4: <trainlog>
 5:   <! This session was part of the marathon training group run. >
 6:   <session date="11/19/05" type="running" heartrate="158">
 7:     <duration units="minutes">45</duration>
 8:     <distance units="miles">5.5</distance>
 9:     <location>Warner Park</location>
10:     <comments>Mid-morning run, a little winded throughout.</comments>
11:   </session>
13:   <session date="11/21/05" type="cycling" heartrate="153">
14:     <duration units="hours">2.5</duration>
15:     <distance units="miles">37.0</distance>
16:     <location>Natchez Trace Parkway</location>
17:     <comments>Hilly ride, felt strong as an ox.</comments>
18:   </session>
20:   ...
21: </trainlog>

The first sample query I want to show you involves plucking out a certain type of training session and then transforming it into a different XML format. This might be useful in a situation where you are interfacing two applications that don't share the same data format. Listing 18.3 contains the code for the query, which is stored in the file TRainlog1.xq.

Listing 18.3. A Query to Retrieve and Transform Running Sessions
 1: xquery version "1.0";
 3: <runsessions>
 4:   { for $s in //trainlog/session[@type="running"]
 5:     order by $s/date
 6:     return <location>{$s/@date} {data($s/location)} ({data($s/distance)}
 7:   }
 8: </runsessions>

To issue this query against the TRainlog.xml document using Saxon, just issue the following command from within the main Saxon folder:

java net.sf.saxon.Query -s trainlog.xml trainlog1.xq >output1.xml

If you run the sample query from within the main Saxon folder as I've suggested, make sure to copy the sample files into that folder so that Saxon can access them. Otherwise, you can add the main Saxon folder to your path and run it from anywhere.

This command executes the query and writes the results to the file output1.xml, which is shown in Listing 18.4.

Listing 18.4. The XQuery Results of the Running Query
 1: <?xml version="1.0" encoding="UTF-8"?>
 2: <runsessions>
 3:   <location date="11/19/05">Warner Park (5.5miles)</location>
 4:   <location date="11/24/05">Warner Park (8.5miles)</location>
 5:   <location date="11/26/05">Metro Center (7.5miles)</location>
 6:   <location date="11/29/05">Warner Park (10.0miles)</location>
 7:   <location date="11/31/05">Warner Park (12.5miles)</location>
 8:   <location date="12/04/05">Warner Park (13.5miles)</location>
 9: </runsessions>

As the listing reveals, only the running training sessions are returned, and they are formatted into a new XML structure that is somewhat different than the original training log. The <location> tag may look familiar but it now contains the date attribute, which was previously a part of the <session> tag. The new code also combines the location, distance, and distance units into the content of the <location> tag. And finally, the individual location elements are packaged within a new root element named runsessions.

Although the previous example is certainly interesting in terms of how it transforms XML data, it doesn't give you anything remarkable to look at. What would be even better is to see the results of a query in a web browser. Of course, this requires transforming query results into HTML code. Listing 18.5 contains a query that grabs every training log session and transforms it into an HTML document with carefully formatted table rows for each row of query data.

Listing 18.5. A Query to Format Training Sessions into an HTML Document
 1: xquery version "1.0";
 3: <html>
 4:   <head>
 5:     <title>Training Sessions</title>
 6:   </head>
 8:   <body style="text-align:center">
 9:     <h1>Training Sessions</h1>
10:     <table border="1px">
11:       <tr>
12:         <th>Date</th>
13:         <th>Type</th>
14:         <th>Heart Rate</th>
15:         <th>Location</th>
16:         <th>Duration</th>
17:         <th>Distance</th>
18:       </tr>
19:       { for $s in //session
20:         return <tr> <td>{data($s/@date)}</td> <td>{data($s/@type)}</td>
21:         <td>{data($s/@heartrate)}</td>
22:         <td>{data($s/location)}</td>
23:         <td>{data($s/duration)} {data($s/duration/@units)}</td>
24:         <td>{data($s/distance)} {data($s/distance/@units)}</td> </tr>
25:       }
26:     </table>
27:   </body>
28: </html>

This query is certainly more involved than anything you've seen thus far in this lesson but it really isn't very complicatedmost of the code is just HTML wrapper code to format the query results for display. Pay particular attention to how each piece of XML data is carefully wrapped in a <td> element so that it is arranged within an HTML table (lines 20 through 24).

The following command is all it takes to generate an HTML document using the query in Listing 18.5:

java net.sf.saxon.Query -s trainlog.xml trainlog2.xq >output2.html

Listing 18.6 contains the transformed HTML (XHTML) document that results from this Saxon command.

Listing 18.6. The Partial XQuery Results of the Training Session Query
 1: <?xml version="1.0" encoding="UTF-8"?>
 2: <html>
 3:   <head>
 4:     <title>Training Sessions</title>
 5:   </head>
 6:   <body style="text-align:center">
 7:     <h1>Training Sessions</h1>
 8:     <table border="1px">
 9:       <tr>
10:         <th>Date</th>
11:         <th>Type</th>
12:         <th>Heart Rate</th>
13:         <th>Location</th>
14:         <th>Duration</th>
15:         <th>Distance</th>
16:       </tr>
17:       <tr>
18:         <td>11/19/05</td>
19:         <td>running</td>
20:         <td>158</td>
21:         <td>Warner Park</td>
22:         <td>45minutes</td>
23:         <td>5.5miles</td>
24:       </tr>
25:       ...
26:     </table>
27:   </body>
28: </html>

No surprises herejust a basic HTML document with a table full of training log information. I've deliberately only showed the partial results since the table data is actually fairly long due to the number of training log elements. Figure 18.1 shows this web page as viewed in Internet Explorer.

Figure 18.1. The training session HTML query result document as viewed in Internet Explorer.

Finally, some visible results from XQuery! XQuery is a powerful technology that makes it possible to drill down into the inner depths of XML code and extract data with a great deal of precision. This tutorial and the two examples you just saw truly only scratch the surface of XQuery.