The Pieces and Parts of XSL : XML

Construct a result tree from a transformation of a source document tree
Interpret the result tree for formatting purposes

The first task addressed by the XSL processor is known as tree transformation and involves transforming a source tree of document content into a result tree. Tree transformation is basically the process of transforming XML content from one XML language into another and involves the use of XSLT. The second task performed by the XSL processor involves examining the result tree for formatting information and formatting the content for each node accordingly. This task requires the use of XSL-FO and is currently not supported very well in web browsers. Even so, it is a critical part of XSL that will likely play a significant role in the future of XML.

Although it certainly seems convenient to break up XSL processing into two tasks, there is a much more important reason for doing so than mere convenience. One way to understand this significance is to consider CSS, which supports only the formatting of XML content. The limitations of CSS are obvious when you consider that a source document can't really be modified in any way for display purposes. On the other hand, with XSL you have complete freedom to massage the source document at will during the transformation part of the document processing. The one-two punch of transformation followed by formatting provides an incredible degree of flexibility for rendering XML documents for display.

The two fundamental tasks taken on by the XML processor directly correspond to two XSL technologies: XSLT and XSL-FO. Additionally, there is a third XSL technology, XPath, which factors heavily into XSLT. XSLT and XSL-FO are both implemented as XML languages, which makes their syntax familiar. This also means that style sheets created from them are XML documents. The interesting thing about these two components of XSL is that they can be used together or separately. You can use XSLT to transform documents without any concern over how the documents are formatted. Similarly, you can use XSL Formatting Objects to format XML documents without necessarily performing any transformation on them.

By the Way

Keep in mind that while web browsers have been slow to adopt XSL-FO, there are plenty of tools available for formatting XML code using XSL-FO. Later in Tutorial 14 you find out how to use one of these tools to convert an XML document into a PDF document via XSL-FO.

The important thing to keep in mind regarding the structure of XSL is the fact that XSL is really three languages, not one. XSLT is the XSL transformation language that is used to transform XML documents from one vocabulary to another. XSL-FO is the XSL formatting language that is used to apply formatting styles to XML documents for presentation purposes. And finally, XPath is a special non-XML expression language used to address parts of an XML document.

By the Way

Although you learn the basics of XPath in this tutorial and the next, you aren't formally introduced to it until Tutorial 22, "Addressing and Linking XML Documents." In that tutorial you learn the details of how to address portions of an XML document using XPath.

XSL Transformation

XSL Transformation (XSLT) is the transformation component of the XSL style sheet technology. XSLT consists of an XML-based markup language that is used to create style sheets for transforming XML documents. These style sheets operate on parsed XML data in a tree form, which is then output as a result tree consisting of the transformed data. XSLT uses a powerful pattern-matching mechanism to select portions of an XML document for transformation. When a pattern is matched for a portion of a tree, a template is used to determine how that portion of the tree is transformed. You learn more about how templates and patterns are used to transform XML documents a little later in this lesson.

An integral part of XSLT is a technology known as XPath, which is used to select nodes for processing and generating text. The next section examines XPath in more detail. The remainder of this tutorial and the next tackles XSLT in greater detail.

XPath

XPath is a non-XML expression language that is used to address parts of an XML document. XPath is different from its other XSL counterparts (XSLT and XSL-FO) in that it isn't implemented as an XML language. This is due to the fact that XPath expressions are used in situations where XML markup isn't really applicable, such as within attribute values. As you know, attribute values are simple text and therefore can't contain additional XML markup. So, although XPath expressions are used within XML markup, they don't directly use familiar XML tags and attributes themselves.

The central function of XPath is to provide an abstract means of addressing XML document partsfor this reason, XPath forms the basis for document addressing in XSLT. The syntax used by XPath is designed for use in URIs and XML attribute values, which requires it to be extremely concise. The name XPath is based on the notion of using a path notation to address XML documents, much as you might use a path in a file system to describe the location of a file. Similar to XSLT, XPath operates under the assumption that a document has been parsed into a tree of nodes. XPath defines different node types that are used to describe the nodes that appear within a tree. There is always a single root node that serves as the root of an XPath tree, and that appears as the first node in the tree. Every element in a document has a corresponding element node that appears in the tree under the root node. Within an element node there are other types of nodes that correspond to the element's content. Element nodes may have a unique identifier associated with them, which is used to reference the node with XPath.

Following is an example of a simple XPath expression, which demonstrates how XPath expressions are used in attribute values:

<xsl:for-each select="contacts/contact">

This code shows how an XPath expression is used within an XSLT element (xsl:for-each) to reference elements named contact that are children of an element named contacts. Although it isn't important for you to understand the implications of this code in an XSLT style sheet, it is important to realize that XPath is used to address certain nodes (elements) within a document.

When an XPath expression is used in an XSLT style sheet, the evaluation of the expression results in a data object of a specific type, such as a Boolean (true/false) or a number. The manner in which an XPath expression is evaluated is entirely dependent upon the context of the expression, which isn't determined by XPath. The context of an XPath expression is determined by XSLT, which in turn determines how expressions are evaluated. This is the abstract nature of XPath that allows it to be used as a helper technology alongside XSLT to address parts of documents.

By the Way

XPath's role in XSL doesn't end with XSLTXPath is also used with XLink and XPointer, which you learn about in Tutorial 22.

XSL Formatting Objects

XSL Formatting Objects (XSL-FO) represents the formatting component of the XSL style sheet technology and is designed to be a functional superset of CSS. This means that XSL-FO contains all of the functionality of CSS, even though it uses its own XML-based syntax. Similar to XSLT, XSL-FO is implemented as an XML language, which is beneficial for both minimizing the learning curve for XML developers and easing its integration into existing XML tools. Also like XSLT, XSL-FO operates on a tree of XML data, which can either be parsed directly from a document or transformed from a document using XSLT. For formatting purposes, XSL-FO treats every node in the tree as a formatting object, with each node supporting a wide range of presentation styles. You can apply styles by setting attributes on a given element (node) in the tree.

There are formatting objects that correspond to different aspects of document formatting such as layout, pagination, and content styling. Every formatting object has properties that are used to somehow describe the object. Some properties directly specify a formatted result, such as a color or font, whereas other properties establish constraints on a set of possible formatted results. Following is perhaps the simplest possible example of XSL-FO, which sets the font family and font size for a block of text:

<fo:block font-family="Arial" font-size="16pt">
  This text has been styled with XSL-FO!
</fo:block>

As you can see, this code performs a similar function to CSS in establishing the font family and font size of a block of text. XSL-FO actually goes further than CSS in allowing you to control the formatting of XML content in extreme detail. The layout model employed by XSL-FO is described in terms of rectangular areas and spaces, which isn't too surprising considering that this approach is employed by most desktop publishing applications. Rectangular areas in XSL-FO are not objects themselves, however; it is up to formatting objects to establish rectangular areas and the relationships between them. This is somewhat similar to rectangular areas in CSS, where you establish the size of an area (box) by setting the width and height of a paragraph of text. XSL-FO also offers a very high degree of control over print-specific page attributes such as page margins.

The XSL processor is heavily involved in carrying out the functionality in XSL-FO style sheets. When the XSL processor processes a formatting object within a style sheet, the object is mapped into a rectangular area on the display surface. The properties of the object determine how it is formatted, along with the parameters of the area into which it is mapped.

By the Way

The immediate downside to XSL-FO is that there is little support for it in major web browsers. For this reason, coverage of XSL-FO here focuses solely on formatting XML data for print purposes (Tutorial 14).