Working with the XSL-FO Language : XML

Before you actually get into the guts of XSL-FO code, you might be curious as to how you go about manipulating and using XSL-FO. Because web browsers aren't really a viable option just yet, you have to look to other tools for XSL-FO document processing. One popular XSL-FO tool is called FOP, which stands for Formatting Objects Processor. FOP is a freely available open source tool that you can download from the Apache XML Graphics Project at http://xmlgraphics.apache.org/fop/. A good commercial option for XSL-FO processing is XEP, which is available for purchase from the RenderX web site at http://www.renderx.com/. You learn more about both of these tools, especially FOP, later in the tutorial.

Stylus Studio is another popular commercial tool that supports XSL-FO. Check it out at http://www.stylusstudio.com/xsl_fo_processing.html.

The remainder of this section introduces you to the tags and attributes that make up the XSL-FO language, and ultimately give you the power to format and style XML content to your heart's desire.

The Core XSL-FO Document Structure

Strangely enough, the W3C hasn't made available an official DTD for XSL-FO, so I can't just show you a DTD in order to explain the language. And in fact, even if such an official DTD existed, it would be far too complicated to make out the language in one sitting, or 10 sittings for that matter! It turns out that XSL-FO is a very "deep" language, supporting numerous objects and options. All of the inner workings of the XSL-FO language could easily fill an entire book. Because our goal here is to knock out XSL-FO in an hour, I'll focus instead on the core language components that allow you to perform basic XML document formatting.

An "experimental" DTD for XSL-FO does exist, although it wasn't created by the W3C. It was created by RenderX, the makers of the XEP XSL-FO processor. Later in the lesson you learn how to use the RenderX XSL-FO DTD to validate XSL-FO documents.

So without further ado, Listing 14.1 contains the code for a skeletal XSL-FO document.

Listing 14.1. A Skeletal XSL-FO Document

 1: <?xml version="1.0" encoding="utf-8"?>
 2:
 3: <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
 4:   <fo:layout-master-set>
 5:     <fo:simple-page-master master-name="skeleton">
 6:       <fo:region-body margin="1in"/>
 7:     </fo:simple-page-master>
 8:   </fo:layout-master-set>
 9:
10:   <fo:page-sequence master-reference="skeleton">
11:     <fo:flow flow-name="xsl-region-body">
12:       <fo:block>Howdy, world!</fo:block>
13:     </fo:flow>
14:   </fo:page-sequence>
15: </fo:root>

Perhaps the first thing to notice in this code is the XSL-FO namespace, which is assigned the prefix fo (line 3). The namespace itself is located at http://www.w3.org/1999/XSL/Format. Throughout the remainder of the document, the fo prefix is used in front of every XSL-FO tag; this is typically how XSL-FO documents are coded. You may also notice that the root element of the skeletal document is fo:rootthis is the standard root element for all XSL-FO documents.

The next element is where things get interesting. I'm referring to fo:layout-master-set, which encloses one or more page masters (line 5). Although it may sound imposing, a page master is just a description of a formatted page. For example, a page master describes the size and orientation of a page, along with its margins and other pertinent layout details. Every XSL-FO document must contain at least one page master that is coded as a child of the fo:layout-master-set element. Each page master is coded using the fo:simple-page-master element or the fo:page-sequence element, the latter of which is used to code a sequence of page masters as opposed to a single page master. An example of where you might want to have multiple page masters is a report where there is a cover page that is formatted differently than the internal pages; the cover page would have its own page master, while the internal pages would use a different page master.

An XSL-FO document can only have one fo:layout-master-set element, which houses all of the page masters and page master sequences for the document.

In the skeleton sample document, a single page master is created that simply sets the margin of the page to one inch (lines 5 to 7). The fo:simple-page-master element is used to declare the page master (line 5), and within it the fo:region-body element is used to define the primary content region on the page. You'll notice that the fo:simple-page-master element is given an attribute named master-name, which in this case is assigned the value skeleton (line 5). This attribute serves as a unique ID for the master page, which is used later in the document to associate content with the page.

Getting back to the fo:region-body element, its purpose is to define the main content region of the master page. This element has other parameters that you can use to carefully control how it is laid out on the page but in this example its margins are collectively set to one inch (line 6).

At this point you now have an XSL-FO document with a master page arranged as shown in Figure 14.1.

Figure 14.1. The skeletal XSL-FO document's master page consists of a content region with oneinch margins.

Although the page in the figure is accurate, it doesn't tell the entire story in terms of margins. The region body margins specified in the skeleton document only apply to the region body content area, which is typically inset on the page. To control the margins at the edges of the page, you will typically set margins for the master page itself. Because no master page margins were set in the example, the region body margins effectively serve as general page margins.

But you're not finished with the code in the skeleton document just yet. The last block of code in the document describes a page sequence, which is where formatted content actually enters the picture. The fo:page-sequence element is used to house content in an XSL-FO document, and can be thought of as roughly similar to the body element in an HTML document. The key thing to note about the fo:page-sequence element is how it is associated with a master page via its master-reference attribute (line 10). This is very important because a page sequence must be associated with a master page in order to be laid out. In this way, you can think of a master page as somewhat of a layout template, while the content that gets placed in the template is contained within a page sequence.

Continuing along, the fo:flow element within the fo:page-sequence element is used to flow content onto the page. Just as HTML content flows onto the page in the order it is specified in an HTML document, so does content in an XSL-FO document. The term "flow" is used with this element because content is allowed to flow across multiple pages. Contrast this with static content, which resides on a single page. The specific elements within a flow determine exactly how the flow commences.

When content in a flow is too large to fit on one page, it automatically flows onto another page.

In the skeleton example, the fo:block element is used to flow a block of text onto the page. The fo:block element is conceptually similar to div in HTML in that it represents a rectangular region of text. So if you string along a sequence of fo:block elements within a flow, they will be laid out one below the next. The opposite of the fo:block element is the fo:inline element, which is comparable to span in HTML.

The only piece of code in the skeleton XSL-FO document that you haven't learned about is the flow-name attribute of the fo:flow element (line 11). This attribute determines where the content of the flow will go on the page. Each page is actually divided into several standard regions, one of which is the region body (xsl-region-body), which is the main content area. Other possible values for the flow-name attribute include xsl-region-before, xsl-region-after, xsl-region-start, and xsl-region-end, to name a few. The region-before area is typically used to set header information for a page, whereas region-after similarly applies to footer information. Figure 14.2 shows how these different regions factor into the content area of a page.

Figure 14.2. The content area of a page in XSL-FO is divided into multiple regions that can be targeted with individual flows.

In the skeleton sample document, only the region-body area of the page is used to place content, which means the other areas just collapse to nothing.

I realize I've thrown a lot of XSL-FO information at you quickly, so allow me to quickly summarize the skeleton document in terms of the tags that it uses:

<fo:root> The root of the document, responsible for declaring the XSL-FO namespace
<fo:layout-master-set> Stores one or more page master layouts
<fo:simple-page-master> Represents a simple page master, which serves as a template for a specific type of page
<fo:region-body> The main content area within a master page layout
<fo:page-sequence> A container for content that gets laid out on a page
<fo:flow> A more specific container for content that is allowed to flow from one page to another as necessary
<fo:block> A rectangular content region that resides on its own line on the page; similar to <div> in HTML
<fo:inline> A rectangular content region that appears inline with other content; similar to <span> in HTML

Although XSL-FO is admittedly a little tricky to get the grasp of initially, you now understand the basics of a minimal XSL-FO document. Let's push forward and learn a few more specifics about how to use the XSL-FO language.

Styling Text in XSL-FO

Finally, it's time to see where XSL-FO has some similarity with other technologies that you may be more familiar with. I'm talking about CSS, in which case XSL-FO's text styling properties are very similar to those used in CSS. In XSL-FO, you set the font specifics for text using attributes on the <fo:block> and <fo:inline> tags. More specifically, the font-size, font-family, and font-weight attributes can all be used to set the font for a block or inline content. These attributes are set just like their CSS counterparts.

Following is an example of setting the font size and font family for a block in XSL-FO:

<fo:block text-align="end" font-size="10pt" font-family="serif"
background-color="black" color="white">
  Great Sporting Events
</fo:block>

In this example, the text content Great Sporting Events is styled using a 10-point, serif font. Furthermore, the alignment of the text is set to end via the text-align attribute, which is equivalent to right-alignment in CSS. There is no concept of left or right in XSL-FOinstead, you use start and end when referring to the alignment of content that you might otherwise think of as being left-aligned or right-aligned. Of course, center is still perfectly legit in XSL-FO when it comes to alignment.

The background-color and color attributes in this code are direct carry-overs from CSS. You can use them just as you would the similarly named CSS styles.

Controlling Spacing and Borders

There are a few spacing and border properties that you can set when it comes to XSL-FO content. In fact, there are many more than I have the space to cover, so I'm only going to focus on a couple of them. The space-before and space-after attributes are used to control the spacing before and after a block. Because we're talking about blocks, the spacing applies vertically to the top (space-before) and bottom (space-after) of the block. In this way, the space-before and space-after attributes work sort of like top and bottom margins, except they apply outside of the margins.

Following is an example of setting the space after a block so that the next content is spaced a little further down the page:

<fo:block font-size="18pt" font-family="sans-serif" space-after="5pt"
background-color="black" color="white" text-align="center" padding-top="0pt">
  Great Sporting Events
</fo:block>

Also notice in this code that the padding-top attribute is set, which controls the padding along the top of the block. All of the standard CSS margin and padding styles are available for you in XSL-FO as attributes of the <fo:block> tag. These attributes include margin, margin-left, margin-right, margin-top, margin-bottom, padding, padding-left, padding-right, padding-top, and padding-bottom. There are also several familiar border attributes that you can use with blocks: border, border-left, border-right, border-top, and border-bottom.

Just to make sure you understand how all these spacing and border properties affect XSL-FO block content, take a look at Figure 14.3.

Figure 14.3. The various spacing and border attributes allow you to carefully control the area around XSL-FO block content.

Keep in mind that you will rarely if ever use all of these attributes at once, which means that the others will collapse and not actually affect the spacing of the content.