XML

Rewriting the sample DTD using parameter entities

Ideally, you want your XML Web documents to be compatible with the new XHTML standard. Using entities and with other changes, the DTD example from Chapter 4 can be rewritten as follows:

  <!-- Entities that can occur at block or inline level. ====-->
  <!ENTITY % misc " script
                   | noscript">
  <!ENTITY % Inline "(#PCDATA | %inline; | %misc;)*">
  <!-- Entities for inline elements ================-->
  <!ENTITY % special "br
                      | span
                      | img">
  <!ENTITY % fontstyle "tt
                        | i
                        | b
                        | big
                        | small">
  <!ENTITY % phrase "em
                     | strong
                     | q
                     | sub
                     | sup">
  <!ENTITY % inline.forms "input
                           | select
                           | textarea
                           | label
                           | button">
  <!ENTITY % inline "a
                     | %special;
                     | %fontstyle;
                     | %phrase;
                     | %inline.forms;">
  <!ENTITY % Inline  "(#PCDATA
                     | %inline;
                     | %misc;)*">
  <!-- Entities used for block elements ============-->
  <!ENTITY % heading "h1
                      | h2
                      | h3
                      | h4
                      | h5
                      | h6">
  <!ENTITY % lists "ul
                    | ol">
  <!ENTITY % blocktext "hr
                        | blockquote">
  <!ENTITY % block "p
                    | %heading;
                    | div
                    | %lists;
                    | %blocktext;
                    | fieldset
                    | table">
  <!ENTITY % Block " (%block;
                    | form
                    | %misc; )*">
  <!-- Mixed block and inline ========================-->
  <!-- %Flow; mixes block and inline and is used for list
       items and so on. -->
  <!ENTITY % Flow " (#PCDATA
                   | %block;
                   | form
                   | %inline;
                   | %misc; )*">
  <!ENTITY % form.content " #PCDATA
                           | p
                           | %lists;
                           | %blocktext;
                           | a
                           | %special;
                           | %fontstyle;
                           | %phrase;
                           | %inline.forms;
                           | table
                           | %heading;
                           | div
                           | fieldset
                           | %misc; ">
  <!ENTITY % events " onclick     CDATA  #IMPLIED
                       ondblclick  CDATA  #IMPLIED
                       onmousedown CDATA  #IMPLIED
                       onmouseup   CDATA  #IMPLIED
                       onmouseover CDATA  #IMPLIED
                       onmousemove CDATA  #IMPLIED
                       onmouseout  CDATA  #IMPLIED
                       onkeypress  CDATA  #IMPLIED
                       onkeydown   CDATA  #IMPLIED
                       onkeyup     CDATA  #IMPLIED">
  <!ENTITY % i18n " lang     NMTOKEN  #IMPLIED
                       xml:lang NMTOKEN  #IMPLIED
                       dir      (ltr | rtl )  #IMPLIED">
  <!-- Core attributes common to most elements
   id       Document-wide unique ID
   class    Space-separated list of classes
   style    Associated style info
   title    Advisory title/amplification
  -->
  <!-- Style sheet data -->
  <!ENTITY % StyleSheet "CDATA">
  <!ENTITY % coreattrs " id    ID   #IMPLIED
                       class CDATA  #IMPLIED
                       style CDATA  #IMPLIED">
  <!ENTITY % attrs " %coreattrs;
                        %i18n;
                        %events;">
  <!-- End Entity Declarations  ====================-->
  <!ENTITY % URI "CDATA">
  <!--a Uniform Resource Identifier, see [RFC2396]-->
  <!ELEMENT html  (head, body)>
  <!ATTLIST html  %i18n;
                  xmlns CDATA  #FIXED 'http://www.w3.org/1999/xhtml'>
  <!ELEMENT head  (title, base?)>
  <!ATTLIST head  %i18n;
                  profile CDATA  #IMPLIED>
  <!ELEMENT title  (#PCDATA )>
  <!ATTLIST title  %i18n; >
  <!ELEMENT base EMPTY>
  <!ATTLIST base  target CDATA  #REQUIRED >
  <!ELEMENT body  (basefont? ,  (p )? , table )>
  <!ATTLIST body  alink   CDATA  #IMPLIED
                  text    CDATA  #IMPLIED
                  bgcolor CDATA  #IMPLIED
                  link    CDATA  #IMPLIED
                  vlink   CDATA  #IMPLIED >
  <!ELEMENT basefont EMPTY>
  <!ATTLIST basefont  size CDATA  #REQUIRED >
  <!-- generic language/style container ==============-->
  <!ELEMENT a  (#PCDATA )>
  <!ATTLIST a  %attrs;
               href   CDATA  #IMPLIED
               name   CDATA  #IMPLIED
               target CDATA  #IMPLIED >
  <!ELEMENT table  (tr )+>
  <!ATTLIST table  %attrs;
                   width       CDATA  #IMPLIED
                   rules       CDATA  #IMPLIED
                   frame       CDATA  #IMPLIED
                   align       CDATA  'Center'
                   cellpadding CDATA  '0'
                   border      CDATA  '0'
                   cellspacing CDATA  '0' >
  <!ELEMENT tr  (td+ )>
  <!ATTLIST tr  %attrs; >
  <!ELEMENT td  (cellcontent )>
  <!ATTLIST td  %attrs;
                bgcolor  (Cyan|Lime|Black|White|Maroon ) 'White'
                align   CDATA  'Center'
                rowspan CDATA  #IMPLIED
                colspan CDATA  #IMPLIED >
  <!ELEMENT cellcontent  (%Block; | p?)+>
  <!ATTLIST cellcontent  cellname CDATA  #REQUIRED >
  <!ELEMENT h1 %Inline;>
  <!ATTLIST h1  align CDATA  #IMPLIED
                %attrs; >
  <!ELEMENT h2 %Inline;>
  <!ATTLIST h2  align CDATA  #IMPLIED
                %attrs; >
  <!ELEMENT h3 %Inline;>
  <!ATTLIST h3  align CDATA  #IMPLIED
                %attrs; >
  <!ELEMENT h4 %Inline;>
  <!ATTLIST h4  align CDATA  #IMPLIED
                %attrs; >
  <!ELEMENT h5 %Inline;>
  <!ATTLIST h5  align CDATA  #IMPLIED
                %attrs; >
  <!ELEMENT h6 %Inline;>
  <!ATTLIST h6  align CDATA  #IMPLIED
                %attrs; >
  <!ELEMENT p %Inline;>
  <!ATTLIST p  %attrs; >
  <!-- Inline Element Declarations =================-->
  <!-- Forced line break -->
  <!ELEMENT br EMPTY>
  <!ATTLIST br  %coreattrs;
                clear     CDATA  #REQUIRED >
  <!-- Emphasis -->
  <!ELEMENT em %Inline;>
  <!ATTLIST em  %attrs; >
  <!-- Strong emphasis -->
  <!ELEMENT strong %Inline;>
  <!ATTLIST strong  %attrs; >
  <!-- Inlined quote -->
  <!ELEMENT q %Inline;>
  <!ATTLIST q  %attrs;
               cite  CDATA  #IMPLIED >
  <!-- Subscript -->
  <!ELEMENT sub %Inline;>
  <!ATTLIST sub  %attrs; >
  <!-- Superscript -->
  <!ELEMENT sup %Inline;>
  <!ATTLIST sup  %attrs; >
  <!-- Fixed-pitch font -->
  <!ELEMENT tt %Inline;>
  <!ATTLIST tt  %attrs; >
  <!-- Italic font -->
  <!ELEMENT i %Inline;>
  <!ATTLIST i  %attrs; >
  <!-- Bold font -->
  <!ELEMENT b %Inline;>
  <!ATTLIST b  %attrs; >
  <!-- Bigger font -->
  <!ELEMENT big %Inline;>
  <!ATTLIST big  %attrs; >
  <!-- Smaller font -->
  <!ELEMENT small %Inline;>
  <!ATTLIST small  %attrs; >
  <!-- hspace, border, align, and vspace are not in the strict
      XHTML standard for img. -->
  <!ELEMENT img EMPTY>
  <!ATTLIST img  %attrs;
                align  CDATA  #IMPLIED
                border CDATA  #IMPLIED
                width  CDATA  #IMPLIED
                height CDATA  #IMPLIED
                hspace CDATA  #IMPLIED
                vspace CDATA  #IMPLIED
                src    CDATA  #REQUIRED >
  <!ELEMENT ul  (font? , li+ )>
  <!ATTLIST ul  %attrs;
                type  CDATA  'text' >
  <!ELEMENT ol  (font? , li+ )>
  <!ATTLIST ol  type  CDATA  'text'
                start CDATA  #IMPLIED
                %attrs; >
  <!ELEMENT li  %Flow; >
  <!ATTLIST li  %attrs; >
  <!--================= Form Elements===============-->
  <!--Each label must not contain more than one field.
      Label elements shouldn't be nested.
  -->
  <!ELEMENT label %Inline;>
  <!ATTLIST label  %attrs;
                   for   IDREF  #IMPLIED >
  <!ENTITY % InputType "(text | password | checkbox |
      radio | submit | reset |
      file | hidden | image | button)">
  <!-- The name attribute is required for all elements but
       the submit and reset elements. -->
  <!ELEMENT input EMPTY>
  <!ATTLIST input  %attrs; >
  <!ELEMENT select  (optgroup | option )+>
  <!ATTLIST select %attrs;>
  <!-- Option selector -->
  <!ATTLIST select name     CDATA  #IMPLIED>
  <!ATTLIST select size     CDATA  #IMPLIED>
  <!ATTLIST select multiple  (multiple)  #IMPLIED>
  <!ATTLIST select disabled  (disabled)  #IMPLIED>
  <!ATTLIST select tabindex CDATA  #IMPLIED>
  <!ATTLIST select onfocus  CDATA  #IMPLIED>
  <!ATTLIST select onblur   CDATA  #IMPLIED>
  <!ATTLIST select onchange CDATA  #IMPLIED>
  <!ELEMENT optgroup  (option )+>
  <!ATTLIST optgroup  %attrs;
                      disabled  (disabled )  #IMPLIED
                      label    CDATA  #REQUIRED>
  <!ELEMENT option  (#PCDATA )>
  <!ATTLIST option  %attrs;
                    selected  (selected )  #IMPLIED
                    disabled  (disabled )  #IMPLIED
                    label    CDATA  #IMPLIED
                    value    CDATA  #IMPLIED >
  <!-- Multiple-line text field -->
  <!ELEMENT textarea  (#PCDATA )>
  <!ATTLIST textarea  %attrs; >
  <!ELEMENT legend %Inline;>
  <!ATTLIST legend  %attrs; >
  <!--=================== Horizontal Rule ============-->
  <!ELEMENT hr EMPTY>
  <!ATTLIST hr  %attrs; >
  <!--=================== Block-like Quotes ==========-->
  <!ELEMENT blockquote %Block;>
  <!ATTLIST blockquote  %attrs;
                        cite  CDATA  #IMPLIED >
  <!-- The fieldset element is used to group form fields.
    Only one legend element should occur in the content,
    and if present it should be preceded only by white space.
  -->
  <!ELEMENT fieldset
     (#PCDATA | legend | %block; | form | %inline; | %misc; )*>
  <!ATTLIST fieldset  %attrs; >
  <!ELEMENT script  (#PCDATA )>
  <!ATTLIST script  charset   CDATA  #IMPLIED
                    type      CDATA  #REQUIRED
                    src       CDATA  #IMPLIED
                    defer     CDATA  #IMPLIED
                    xml:space CDATA  #FIXED 'preserve' >
  <!-- Alternative content container for non-script-based
       rendering -->
  <!ELEMENT noscript %Block;>
  <!ATTLIST noscript %attrs; >
  <!ELEMENT button  (#PCDATA | p | %heading; | div | %lists; |
     %blocktext; | table | %special; | %fontstyle; |
     %phrase; | %misc; )*>
  <!ATTLIST button  %attrs;
                    name      CDATA  #IMPLIED
                    value     CDATA  #IMPLIED
                    type      (button | submit | reset )  'submit'
                    disabled  (disabled )  #IMPLIED
                    tabindex  CDATA  #IMPLIED
                    accesskey CDATA  #IMPLIED
                    onfocus   CDATA  #IMPLIED
                    onblur    CDATA  #IMPLIED >
  <!ELEMENT span %Inline;>
  <!ATTLIST span  %attrs; >
  <!--The font element is not included in the XHTML standard. -->
  <!ELEMENT font  (b )>
  <!ATTLIST font  color CDATA  #REQUIRED
                  face  CDATA  #REQUIRED
                  size  CDATA  #REQUIRED >
  <!ELEMENT form %form.content;>
  <!ELEMENT div %Flow;>
  <!ATTLIST div %attrs; >

This might look like a completely different DTD, but it is essentially the same as the DTD we created in Chapter 4. Only one structural change has occurred: the block elements, such as the h1 element, have been moved out of the p element and now are child elements of the body element. Several elements have been added, including the form element itself and its child elements (button, label, select, and so on) and the font formatting elements, including i and b. Numerous additions have been made to the attributes, including language, id, and the scripting events.

XML documents built using this new DTD will still use a table to format and contain all of the elements that will be displayed in the browser. However, in the new DTD, the declaration for the body element is different from that in our original DTD. In our original DTD, the a (anchor) element at the top of the page is a child element of the body element. However, this element is not a child element of the body element in the XHTML standard. As we have seen, the declaration for the body element in the XHTML standard is as follows:

  <!ELEMENT body %Block;>

As we have discussed, the Block internal parameter entity is declared as follows:

  <!ENTITY % Block " (%block; | form | %misc;)*">

Replacing %block; and %misc; results in the following code:

  <!ENTITY % Block " (p | %heading; | div | %lists; |
      %blocktext; | fieldset | table | form | script |
      noscript)*">

Replacing %heading; and %blocktext; will give you the actual declaration for the body element, as shown here:

  <!ENTITY % Block " (p | h1 | h2 | h3 | h4 | h5 | h6 | div | ul |
      ol |  hr | blockquote  | fieldset | table |
      form | script | noscript)*">

It would be worth your time to go through the DTD and replace the entities with their actual values. You may also find it interesting to download the latest version of the XHTML standard and do all of the replacements in that document, too.

Creating this expanded declaration manually took some time, but any of the DTD tools could have done this work for you in just a few moments. For example, Figure 5-2 shows our sample XHTML DTD as it appears in XML Authority.

Figure 5-2. The Body element of the XHTML DTD displayed in XML Authority.

The child elements of the Body element are readily visible. (You can scroll down to see the complete list.)

You do not have to include all of these child elements in your DTD to be compatible with the XHTML standard; instead, you can include only those elements that you need for your projects. If you want to be compliant with the standard, however, you cannot add elements to the body element that are not included in the standard.

Notice that the a element is not a child element of the XHTML body element; it is actually a child element of the p element. Therefore, you cannot use the declaration included in the original DTD we discussed in Chapter 4, shown here:

  <!ELEMENT body (basefont? , a? , table)>

In this declaration, the a element is a child element of the body element, which does not comply with the standard. To solve this problem, you will need to use the p element, as shown here:

  <!ELEMENT body (basefont? , (p)? , table)>

While this declaration makes the DTD conform to the XHTML standard, it also means that any of the inline elements, not just the a element, can be used in the body element as long as they are contained within a p element.

Many child elements that are included in the body element of the XHTML standard are not included in the example DTD. This is because you are using the table to hold most of the content and do not need most of these child elements. You can think of the XML documents defined by the example DTD as a subset of the XML documents defined by the more general XHTML DTD. The example DTD includes only the structure you need for your documents.

The XHTML standard declaration for the table cell element (td) is shown here:

  <!ELEMENT td %Flow;>

If you replace the Flow parameter entity and all of the parameter entities contained within %Flow; as you did earlier for the body element, your final td declaration will look like this:

  <!ELEMENT td #PCDATA | p | h1|h2|h3|h4|h5|h6| div | ul | ol |
      hr | blockquote | fieldset | table | form | a | br | span |
      img | tt | i | b | big | small | em | strong | q | sub |
      sup |input | select | textarea | label | button | script |
      noscript>

As you can see, the Flow entity includes virtually everything. You can use a td element as a container for all of the block and inline elements, which is exactly what you want to do.

In the example DTD, the following declaration is created for the td element and the cellcontent element:

  <!ELEMENT td (cellcontent)>
  <!ELEMENT cellcontent (%Block;)+>

This declaration doesn't comply with the XHTML standard. The cellcontent element does not belong to the standard; it was created for marking up the text. When you use custom elements, such as the cellcontent element in this example, you will need to remove them using Extensible Stylesheet Language (XSL). Using XSL, you can transform the preceding definitions to be:

  <!ELEMENT td (%Block;)+>

The New HelpHTM.htm Document

Because of the changes in the DTD, you will have to make some minor changes to the sample HelpHTM.htm document we created in Chapter 4. You will now have to delete all the p elements because the block elements are no longer child elements of the p elements. You will also have to add several p elements to wrap the a elements. Change the a element at the beginning of the document as shown here:

  <p><a name="Top"><!--Top tag--></a></p>

Then wrap all the links in the lists using the p element. For example, you can wrap the first link in the HelpHTM.htm document as follows:

  <p>
      <a href="FirstTimeVisitorInfo.html" target="">
          First-Time Visitor Information</a>
  </p>

If you do this and then reference the new DTD, the document is valid.