XML

Using XPath Functions

Back in Access your ITunes Music Library via XML, you learned about some of the more commonly used XPath functions and how they can be used to create expressions for XSLT stylesheets. I'd like to revisit the standard XPath functions and go into a little more detail regarding their use in creating expressions. Before getting into the specifics of the XPath functions at your disposal, it's worth taking a look at their general use. The functions supported by XPath, which are available for use in creating XPath expressions, can be roughly divided along the lines of the data types on which they operate:

  • Node functions

  • String functions

  • Boolean functions

  • Number functions

The next few sections explore the functions in each of these categories in more detail. For a complete XPath function reference, please visit the XPath page at the W3C web site at http://www.w3.org/TR/xpath#corelib.

Node Functions

Node functions are XPath functions that relate to the node tree. Although all of XPath technically relates to the node tree, node functions are very direct in that they allow you to ascertain the position of nodes in a node set, as well as how many nodes are in a set. Following are the most common XPath node functions:

  • position() Determine the numeric position of a node

  • last() Determine the last node in a node set

  • count() Determine the number of nodes in a node set

Although these node functions might seem somewhat abstract, keep in mind that they can be used to carry out some interesting tasks when used in the context of a broader expression. For example, the following code shows how to use the count() function to calculate the total distance in the training log document for sessions whose distances are recorded in miles:

count(*/distance[@units='miles'])

Following is another example that shows how to reference a child node based solely upon its position within a document:

child::item[position()=3]

Assuming there are several child elements of type item, this code references the third child item element of the current context. To reference the last child item, you use the last() function instead of an actual number, like this:

child::item[position()=last()]

String Functions

The XPath string functions are used to manipulate strings of text. With the string functions you can concatenate strings, slice them up into substrings, and determine the length of them. Following are the most popular string functions in XPath:

  • concat() Concatenate two strings together

  • starts-with() Determine if a string begins with another string

  • contains() Determine if a string contains another string

  • substring-before() Retrieve a substring that appears before another string

  • substring-after() Retrieve a substring that appears after another string

  • substring() Retrieve a substring of a specified length starting at an index within another string

  • string-length() Determine the length of a string

These XPath string functions can come in quite handy when it comes to building expressions, especially when you consider that XML content is always specified as raw text. In other words, it is possible to manipulate most XML content as a string, regardless of whether the underlying value of the content is numeric or some other data type. Following is an example that demonstrates how to extract the month of a training session from a date attribute in the training log document:

substring-after(/session[1]@date, "/")

In this example, the substring-after() function is called and passed the date attribute. Because a forward slash (/) is passed as the second argument to the function, it is used as the basis for finding the substring. If you look back at one of the date attributes in the document (line 6, for example), you'll notice that the month appears just after the first forward slash. As a comparison, you could extract the year as a substring by providing the same arguments but instead using the substring-before() function:

substring-before(/session[1]@date, '/')

Another use of the string functions is finding nodes that contain a particular substring. For example, if you wanted to analyze your training data and look for training sessions where you felt strong, you could use the contains() function to select session elements where the comments child element contains the word "strong":

*/session[contains(comments, 'strong')]

In this example, the second and third session elements would be selected because they both contain the word "strong" in their comments child elements (lines 17 and 24).

Boolean Functions

Boolean functions are pretty simple in that they operate solely on Boolean (true/false) values. Following are the two primary Boolean functions that you may find useful in XPath expressions:

  • not() Negate a Boolean value

  • lang() Determine if a certain language is being used

The not() function is pretty straightforward in that it simply reverses a Boolean value: true becomes false and false becomes true. The lang() function is a little more interesting because it actually queries a node to see what language it uses. As an example, many English-language XML documents set the xml:lang attribute to en in the root element. Although this value typically cascades down to all elements within the document, it's possible for a document to use multiple languages. The lang() function allows you to check the language setting for any node. Following is an example of how to use the not() and lang() functions to determine if the English language is not being used in a document:

not(lang("en"))