XML

Associating the DTD With an XML Document

Now that the DTD has been created, it can be used to validate the Help.htm document we created in Chapter 3. There are two ways to associate a DTD with an XML document: the first is to place the DTD code within the XML document, and the second is to create a separate DTD document that is referenced by the XML document. Creating a separate DTD document allows multiple XML documents to reference the same DTD. We will take a look at how to declare a DTD first, and then examine how to place a DTD within the XML document.

The !DOCTYPE statement is used to declare a DTD. For an internal DTD, called an internal subset, you can use the following syntax:

  <!DOCTYPE DocName [ DTD ]>

The new XML document that combines Help.htm and the DTD would look like this:

  <!DOCTYPE HTML
  [
  <!ELEMENT html  (head, body)>
  <!ELEMENT head  (title, base?)>
  <!ELEMENT title  (#PCDATA)>
  <!ELEMENT base EMPTY>
  <!ATTLIST base  target CDATA  #REQUIRED>
  <!ELEMENT body  (basefont?, a?, table)>
  <!ATTLIST body  alink   CDATA  #IMPLIED
                  text    CDATA  #IMPLIED
                  bgcolor CDATA  #IMPLIED
                  link    CDATA  #IMPLIED
                  vlink   CDATA  #IMPLIED>
  <!ELEMENT basefont EMPTY>
  <!ATTLIST basefont  size CDATA  #REQUIRED>
  <!ELEMENT a  (#PCDATA)>
  <!ATTLIST a  linkid ID     #IMPLIED
               href   CDATA  #IMPLIED
               name   CDATA  #IMPLIED
               target CDATA  #IMPLIED>
  <!ELEMENT table  (tr+)>
  <!ATTLIST table  width       CDATA  #IMPLIED
                   rules       CDATA  #IMPLIED
                   frame       CDATA  #IMPLIED
                   align       CDATA  'Center'
                   cellpadding CDATA  '0'
                   border      CDATA  '0'
                   cellspacing CDATA  '0'>
  <!ELEMENT tr  (td+)>
  <!ATTLIST tr  bgcolor  (Cyan | Lime | Black | White | Maroon) 'White'
                valign   (Top | Middle | Bottom)  'Middle'
                align    (Left | Right | Center)  'Center'>
  <!ELEMENT td  (CellContent)>
  <!ATTLIST td  bgcolor  (Cyan | Lime | Black | White | Maroon) 'White'
                valign   (Top | Middle | Bottom)  'Middle'
                align    (Left | Right | Center)  'Center'
                rowspan CDATA  #IMPLIED
                colspan CDATA  #IMPLIED>
  <!ELEMENT CellContent  (h1? | p?)+>
  <!ATTLIST CellContent  cellname CDATA  #REQUIRED>
  <!ELEMENT h1  (#PCDATA)>
  <!ATTLIST h1  align CDATA  #IMPLIED>
  <!ELEMENT ImageLink  (img, br?)>
  <!ELEMENT p  (#PCDATA | font | ImageLink | a | ul | ol)+>
  <!ATTLIST p  align CDATA  #IMPLIED>
  <!ELEMENT font  (#PCDATA | b)*>
  <!ATTLIST font  color  (Cyan | Lime | Black | White | Maroon) 'Black'
                  face   (&apos;Times New Roman &apos;| Arial)#REQUIRED
                  size   (2 | 3 | 4 | 5 | 6)  '3'>
  <!ELEMENT b  (#PCDATA)>
  <!ELEMENT img EMPTY>
  <!ATTLIST img  width  CDATA  #IMPLIED
                 height CDATA  #IMPLIED
                 hspace CDATA  #IMPLIED
                 vspace CDATA  #IMPLIED
                 src    CDATA  #IMPLIED
                 alt    CDATA  #IMPLIED
                 align  CDATA  #IMPLIED
                 border CDATA  #IMPLIED
                 lowsrc CDATA  #IMPLIED>
  <!ELEMENT br EMPTY>
  <!ATTLIST br  clear CDATA  #REQUIRED>
  <!ELEMENT ul  (font?, li+)>
  <!ATTLIST ul  type CDATA  #IMPLIED>
  <!ELEMENT li  (font? | a?)+>
  <!ELEMENT ol  (font?, li+)>
  <!ATTLIST ol  type  CDATA  #REQUIRED
                start CDATA  #REQUIRED>
  ]>
  <html>
      <head>
          <title>Northwind Traders Help Desk</title>
          <base target=""><!--Default link for page--></base>
      </head>
      <body text="#000000" bgcolor="#FFFFFF" link="#003399"
            alink="#FF9933" vlink="#996633">
          <!--Default display colors for entire body-->
          <a name="Top"><!--Anchor for top of page--></a>
          <table border="0" frame="" rules="" width="100%" align=""
                 cellspacing="0" cellpadding="0">
              <!--Rules/frame is used with border-->
              <tr valign="Center">
                  <td rowspan="" colspan="2" align="Center">
                      <!--Either rowspan or colspan can be used, but
                          not both-->
                      <!--Valign: top, bottom, middle-->
                      <CellContent cellname="Table Header">
                          <h1 align="Center">Help Desk</h1>
                      </CellContent>
                  </td>
              </tr>
              <tr valign="Top">
                  <td rowspan="" colspan="" align="Left">
                      <CellContent cellname="Help Topic List">
                          <p align="">
                          <ul type="">
                          <font face="" color="" size="3">
                              <b>For First-Time Visitors</b>
                          </font>
                          <li>
                          <a href="FirstTimeVisitorInfo.htm" target="">
                              First-Time Visitor Information
                          </a>
                          </li>
                          <li>
                          <a href="SecureShopping.htm" target="">
                              Secure Shopping at Northwind Traders
                          </a>
                          </li>
                          <li>
                          <a href="FreqAskedQ.htm" target="">
                              Frequently Asked Questions
                          </a>
                          </li>
                          <li>
                          <a href="NavWeb.htm" target="">
                              Navigating the Web
                          </a>
                          </li>
                          </ul>
                          </p>
                      </CellContent>
                  </td>
                  <td rowspan="" colspan="" align="Left">
                      <CellContent cellname="Shipping Links">
                          <p align="">
                          <ul type="">
                          <font face="">
                              <b>Shipping</b>
                          </font>
                          <li>
                          <a href="Rates.htm" target="">
                              Rates
                          </a>
                          </li>
                          <li>
                          <a href="OrderCheck.htm" target="">
                              Checking on Your Order
                          </a>
                          </li>
                          <li>
                          <a href="Returns.htm" target="">
                              Returns
                          </a>
                          </li>
                          </ul>
                          </p>
                      </CellContent>
                  </td>
              </tr>
          </table>
      </body>
  </html>

The marked-up text has remained the same with one exception. Any element that uses an enumerated data type cannot have an attribute set to an empty string (""). For example, if a tr element does not use the align attribute, the attribute must be removed from the element. Because a default value (Center) has been assigned in the DTD for the align attribute of the tr element, the default value will be applied only when the attribute is omitted.

If you open this document in the browser, you will find that it almost works. The closing brackets (]>) belonging to the !DOCTYPE statement will appear in the browser, however, which is not acceptable. To solve this problem, save the original DTD in a file called StandardHTM.dtd, remove the empty attributes that have an enumerated data type, and reference the external file StandardHTM.dtd in the new file named HelpHTM.htm. The format for a reference to an external DTD is as follows:

  <!DOCTYPE RootElementName SYSTEM|PUBLIC [Name]DTD-URI>

RootElementName is the name of the root element (in this example, html). The SYSTEM keyword is needed when you are using an unpublished DTD. If a DTD has to be published and given a name, the PUBLIC keyword can be used. If the parser cannot identify the name, the DTD-URI will be used. You must specify the location of the Uniform Resource Identifier (URI) of the DTD in the DTD-URI. A URI is a general type of system identifier. One type of URI is the Uniform Resource Locator (URL) you're familiar with from the Internet.

For our example, we would need to add the following line of code to the beginning of the document HelpHTM.htm:

  <!DOCTYPE html SYSTEM "StandardHTM.dtd">

A browser that does not understand XML will ignore this statement. Thus, by using an external DTD, you not only have an XML document that can be validated, but also one that can be displayed in any browser.

Summary

You now know how to build a DTD to define a set of rules that can be used to validate an XML document. Using DTDs, a standard set of rules can be developed that can be used to create standard XML documents. These documents can be exchanged between corporations or internally within a corporation and validated using the DTD. The DTD can also be used to create standard documents within a group, such as a group that is building an e-commerce site.

In Chapter 5, we'll look at entities. Entities enable you to create reusable strings within a DTD.