XML

Creating and Validating XHTML Documents

Because XHTML is an XML-based markup language, creating XHTML documents is very much like creating any other kind of XML document. You must first learn the XHTML language, after which you use a text editor or other XML development tool to construct a document using XHTML elements and attributes. If you've always created web pages using visual authoring tools, such as FrontPage or Dreamweaver, the concept of assembling a web page in a text editor might be new. On the other hand, if you aren't a seasoned web developer and your only experience with markup languages is XML, you'll feel right at home. The next few sections explore the basics of creating and validating XHTML documents.

Many visual web development tools support XHTML, which means that you can certainly create XHTML web pages without having to rely on a simple text editor. However, if you really want to learn how XHTML works as a language, you have to get dirty and explore XHTML code. Fortunately, many web development tools include a code view that allows you to view the underlying code for a page. If your development tool offers such a view, you can forego using a simple text editor. Such code viewers also offer editing features such as context-sensitive color highlighting and automatic tag matching, which can assist you in writing valid XHTML code.

Preparing XHTML Documents for Validation

Just as it is beneficial to validate other kinds of XML documents, it is also important to validate XHTML documents to ensure that they adhere to the XHTML language. As you know, validation is carried out through a schema, which can be either a DTD or an XSD. Both kinds of schemas are available for use with XHTML. I'll focus on the usage of an XHTML DTD to validate XHTML documents because DTDs are still more widely supported than XSDs. Before getting into the specifics of using a DTD to validate XHTML documents, it's necessary to clarify the different versions of XHTML and how they impact XHTML document validation.

The first version of XHTML was version 1.0, which focused on a direct interpretation of HTML 4.0 as an XML-based markup language. Because HTML 4.0 is a fairly large and complex markup language, the W3C decided to offer XHTML 1.0 in three different flavors, which vary in their support of HTML 4.0 features:

  • Strict No HTML presentation elements are available (font, table, and so on)

  • Transitional HTML presentation elements are available for formatting documents

  • Frameset Frames are available, as well as HTML presentation elements

These different strains of XHTML are listed in order of increasing functionality, which means that the Frameset feature set is richer and therefore more complex than the Strict feature set. These three different strains of XHTML 1.0 are realized by three different DTDs that describe the elements and attributes for each feature set. The idea is that you can use a more minimal XHTML DTD if you don't need to use certain XHTML language features, or you can use a more thorough DTD if you need additional features, such as frames.

The Strict DTD is a minimal DTD that is used to create very clean XHTML documents without any presentation markup. Documents created from this DTD require style sheets in order to be formatted for display because they don't contain any presentation markup. The Transitional DTD builds on the Strict DTD by adding support for presentation markup elements. This DTD is useful in performing a quick conversion of HTML documents to XHTML when you don't want to take the time to develop style sheets. The Frameset DTD is the broadest of the three DTDs and adds support for creating documents with frames.

The three DTDs associated with XHTML 1.0 can certainly be used to validate XHTML documents, but there is a newer version of XHTML known as XHMTL 1.1 that includes a DTD of its own. The XHTML 1.1 DTD is a reformulation of the XHTML 1.0 Strict DTD that is designed for modularity. The idea behind the XHTML 1.1 DTD is to provide a means of expanding XHTML to support other XML-based languages, such as MathML for mathematical content. Because the XHTML 1.1 DTD is based upon the Strict XHTML 1.0 DTD, it doesn't include support for presentation elements or framesets. The XHTML 1.1 DTD is therefore designed for pure XHTML documents that adhere to the XML adage of separating content from how it is formatted and displayed.

Regardless of which XHTML DTD you decide to use to validate XHTML documents, there are a few other validity requirements to which all XHTML documents must adhere:

  • There must be a document type declaration (DOCTYPE) in the document that appears prior to the root element

  • The document must validate against the DTD declared in the document type declaration; this DTD must be one of the three XHTML 1.0 DTDs or the XHTML 1.1 DTD

  • The root element of the document must be html

  • The root element of the document must designate an XHTML namespace using the xmlns attribute

You must declare the DTD for all XHTML documents in a document type declaration at the top of the document. A Formal Public Identifier (FPI) is used in the document type declaration to reference one of the standard XHTML DTDs. Following is an example of how to declare the Strict XHTML 1.0 DTD in a document type declaration:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "DTD/xhtml1-strict.dtd">

It isn't terribly important that you understand the details of the FPI in this code. The main point is that it identifies the Strict XHTML 1.0 DTD and therefore is suitable for XHTML documents that don't require formatting or frameset features. The XHTML 1.0 Transitional DTD is specified using similar code, as the following example reveals:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "DTD/xhtml1-transitional.dtd">

The XHTML 1.0 Frameset DTD is also specified with similar code, as in the following example:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
  "DTD/xhtml1-frameset.dtd">

Finally, the XHTML 1.1 DTD is specified with a document type declaration that is a little different from the others:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

The decision regarding which XHTML DTD to use really comes down to what features your documents require. If you can get by without the presentation or frameset features, then the XHTML 1.0 Strict DTD or the XHTML 1.1 DTD are your best bet. Between the two, it's better to go with the newer XHTML 1.1 DTD because it represents the future direction of XHTML. If your documents require some presentation features, the XHTML 1.0 Transitional DTD is the one for you. And finally, if you need the whole gamut of XHTML features, including framesets, the XHTML 1.0 Frameset DTD is the way to go.

If you truly want to create XML documents that are geared toward the future of the Web, you should target the XHTML 1.1 DTD.

In addition to declaring an appropriate DTD in the document type declaration, a valid XHTML document must also declare the XHTML namespace in the root html element, and it must declare the language. Following is an example of declaring the standard XHTML namespace and the English language in the html element for an XHTML document:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

Putting Together an XHTML Document

XHTML documents are created in much the same way as any other XML document, or any HTML document for that matter. As long as you keep in mind the differences between XHTML and HTML, you can develop XHTML web pages just as you would create HTML web pages, assuming you don't mind creating web pages by hand. To give you a better idea as to how an XHTML document comes together, check out the code for a skeletal XHTML document in Listing 21.1.

Listing 21.1. A Skeletal XHTML Document
 1: <?xml version="1.0" encoding="UTF-8"?>
 2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
 3:   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
 4:
 5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
 6:   <head>
 7:     <title>Skeletal XHTML Document</title>
 8:   </head>
 9:
10:   <body>
11:     <p>
12:       This is a skeletal XHTML document.
13:     </p>
14:   </body>
15: </html>

The skeleton.xhtml document admittedly doesn't do much in terms of being a useful web page, but it does demonstrate how to create a legal XHTML document. In other words, the skeletal document declares an XHTML DTD and namespace and adheres to all of the structural and syntax rules of XML. It can also be viewed directly in a web browser. The main significance of the skeletal XHTML document is that it serves as a great template for creating other XHTML documents.

Validating an XHTML Document

As with any XML document, it's very important to be able to validate XHTML documents. You've already learned about the DTDs that factor into XHTML validation, but you haven't learned exactly when XHTML documents are validated. Keep in mind that it takes processing time for any XML document to be validated, and in the case of XHTML, this could hinder the speed at which web pages are served and displayed. The ideal scenario in terms of performance is for developers to validate XHTML documents before making them publicly available, which alleviates the need for browsers to perform any validation. On the other hand, there currently is a lot of HTML code generated on the fly by scripting languages and other interactive technologies, in which case it might be necessary for a browser to sometimes validate XHTML documents.

Although there are no rules governing the appropriate time for XHTML documents to be validated, it's generally a good idea for you to take the initiative to validate your own documents before taking them live. Fortunately, the W3C provides a free online validation service known as the W3C Validator that can be used to validate XHTML documents. This validation service is available online at http://validator.w3.org/ and is shown in Figure 21.1.

Figure 21.1. The W3C Validator service is capable of validating XHTML documents as well as HTML documents.

You can see in the figure that the W3C Validator is used by entering the URI of an XHTML document. Of course, web pages are typically developed offline, which means you may not have published them to an accessible URI online. In this case, you can simply choose the File Upload option on the W3C Validator page, which allows you to browse your computer for an XHTML document file. If you want to exercise more control over the validation of XHTML documents that you upload, you may want to consider using the Extended File Upload Interface, which is available via a text link just below the Validate by File Upload option (see Figure 21.1).

Figure 21.2 shows the results of validating the skeleton.xhtml document using the W3C Validator.

Figure 21.2. The results of passing the skeletal XHTML document through the W3C Validator.

As the figure reveals, the skeletal document passed the W3C Validator with flying colors, which isn't too much of a surprise. This is a handy service to have around when creating XHTML documents, especially when you consider that it is always up to date with the latest standards set forth by the W3C.