The XHTML Standard and Internal Parameter Entities
Now would be a good time to introduce a new standard that is being created for HTML. This new standard is called XHTML; it is also represented in a new version of HTML (version 4.01). The World Wide Web Consortium (W3C) standards committee is currently working out the last details of the standard, which is all about doing what we've done in the last few chapters, XMLizing HTML. You can find information about this standard by visiting http://www.w3.org.
Basically, the XHTML standard introduces two content models: inline and block. The inline elements affect individual text elements, whereas the block elements affect entire blocks of text. These two elements are then used as child elements for other elements.
Inline entities and elements
The XHTML standard provides the following declarations for defining a series of internal parameter entities to be used to define the inline elements:
<!ENTITY % special "br
| span
| img">
<!ENTITY % fontstyle "tt
| i
| b
| big
| small">
<!ENTITY % phrase "em
| strong
| q
| sub
| sup">
<!ENTITY % inline.forms "input
| select
| textarea
| label
| button">
<!ENTITY % inline "a
| %special;
| %fontstyle;
| %phrase;
| %inline.forms;">
<!-- Entities that can occur at block or inline level. -->
<!ENTITY % misc "script
| noscript">
<!ENTITY % Inline " (#PCDATA
| %inline;
| %misc; )*">
|
This declaration fragment builds the final Inline parameter entity in small pieces. Notice that the Inline entity definition contains the inline and misc entities and uses the technique described in Chapter 4 for including an unlimited number of child elements in any order-in this example, using (#PCDATA | %inline; | %misc; )*.
In the example DTD created in Chapters 3 and 4, the p element was used to organize the content within a cell. Although that usage makes sense, the purpose of the p element is to make text that is not included in a block element (such as text within an h element) word-wrap properly. Therefore, putting the h element or any of the block elements within a p element is not necessary because text within a block element is already word-wrapped. On the other hand, if any of the inline elements are used outside of a block element, they should be placed inside a p element so that the text element wraps properly. Therefore, you could rewrite the definition for the p element as follows:
<!ELEMENT p %Inline;> |
This shows exactly the way the definition for the p element appears in the XHTML specification.
Block entities and elements
The XHTML standard also declares a set of internal parameter entities that can be used in the declarations of the block elements. These internal parameter entities appear as follows:
<!ENTITY % heading "h1
| h2
| h3
| h4
| h5
| h6">
<!ENTITY % lists "ul
| ol">
<!ENTITY % blocktext "hr
| blockquote">
<!ENTITY % block "p
| %heading;
| div
| %lists;
| %blocktext;
| fieldset
| table">
<!ENTITY % Block " (%block;
| form
| %misc; )*">
|
Notice that the Block entity contains the block entity, the misc entity, and the form element and also includes an unlimited number of these child elements in any order. Using the Block parameter entity, the declaration for the body element becomes the following:
<!ELEMENT body %Block;> |
As you can see, using the parameter entities, you can give your document a clear structure.
Using parameter entities in attributes
The XHTML standard also uses parameter entities in attributes, as we saw earlier with the events entity. You could use this events entity and two additional entities to create an internal parameter entity for attributes shared among many elements, as shown here:
<!-- Internationalization attributes
lang Language code (backward-compatible)
xml:lang Language code (per XML 1.0 spec)
dir Direction for weak/neutral text
-->
<!ENTITY % i18n " lang NMTOKEN #IMPLIED
xml:lang NMTOKEN #IMPLIED
dir (ltr | rtl ) #IMPLIED">
<!ENTITY % coreattrs
" id ID #IMPLIED
class CDATA #IMPLIED
style CDATA #IMPLIED
title CDATA #IMPLIED">
<!ENTITY % attrs " %coreattrs;
%i18n;
%events;">
|
The language entity i18n can be understood by XML and non-XML compliant browsers and is used to mark elements as belonging to a particular language.
NOTE
For more information about language codes, visit the Web site http://www.oasis-open.org/cover/iso639a.html.
The attrs parameter entity can be used for the most common attributes associated with the HTML elements in the DTD. For example, the body element's attribute can now be written as follows:
<!ATTLIST body %attrs;
onload CDATA #IMPLIED
onunload CDATA #IMPLIED>
|
Rewriting the sample DTD using parameter entities
Ideally, you want your XML Web documents to be compatible with the new XHTML standard. Using entities and with other changes, the DTD example from Chapter 4 can be rewritten as follows:
<!-- Entities that can occur at block or inline level. ====-->
<!ENTITY % misc " script
| noscript">
<!ENTITY % Inline "(#PCDATA | %inline; | %misc;)*">
<!-- Entities for inline elements ================-->
<!ENTITY % special "br
| span
| img">
<!ENTITY % fontstyle "tt
| i
| b
| big
| small">
<!ENTITY % phrase "em
| strong
| q
| sub
| sup">
<!ENTITY % inline.forms "input
| select
| textarea
| label
| button">
<!ENTITY % inline "a
| %special;
| %fontstyle;
| %phrase;
| %inline.forms;">
<!ENTITY % Inline "(#PCDATA
| %inline;
| %misc;)*">
<!-- Entities used for block elements ============-->
<!ENTITY % heading "h1
| h2
| h3
| h4
| h5
| h6">
<!ENTITY % lists "ul
| ol">
<!ENTITY % blocktext "hr
| blockquote">
<!ENTITY % block "p
| %heading;
| div
| %lists;
| %blocktext;
| fieldset
| table">
<!ENTITY % Block " (%block;
| form
| %misc; )*">
<!-- Mixed block and inline ========================-->
<!-- %Flow; mixes block and inline and is used for list
items and so on. -->
<!ENTITY % Flow " (#PCDATA
| %block;
| form
| %inline;
| %misc; )*">
<!ENTITY % form.content " #PCDATA
| p
| %lists;
| %blocktext;
| a
| %special;
| %fontstyle;
| %phrase;
| %inline.forms;
| table
| %heading;
| div
| fieldset
| %misc; ">
<!ENTITY % events " onclick CDATA #IMPLIED
ondblclick CDATA #IMPLIED
onmousedown CDATA #IMPLIED
onmouseup CDATA #IMPLIED
onmouseover CDATA #IMPLIED
onmousemove CDATA #IMPLIED
onmouseout CDATA #IMPLIED
onkeypress CDATA #IMPLIED
onkeydown CDATA #IMPLIED
onkeyup CDATA #IMPLIED">
<!ENTITY % i18n " lang NMTOKEN #IMPLIED
xml:lang NMTOKEN #IMPLIED
dir (ltr | rtl ) #IMPLIED">
<!-- Core attributes common to most elements
id Document-wide unique ID
class Space-separated list of classes
style Associated style info
title Advisory title/amplification
-->
<!-- Style sheet data -->
<!ENTITY % StyleSheet "CDATA">
<!ENTITY % coreattrs " id ID #IMPLIED
class CDATA #IMPLIED
style CDATA #IMPLIED">
<!ENTITY % attrs " %coreattrs;
%i18n;
%events;">
<!-- End Entity Declarations ====================-->
<!ENTITY % URI "CDATA">
<!--a Uniform Resource Identifier, see [RFC2396]-->
<!ELEMENT html (head, body)>
<!ATTLIST html %i18n;
xmlns CDATA #FIXED 'http://www.w3.org/1999/xhtml'>
<!ELEMENT head (title, base?)>
<!ATTLIST head %i18n;
profile CDATA #IMPLIED>
<!ELEMENT title (#PCDATA )>
<!ATTLIST title %i18n; >
<!ELEMENT base EMPTY>
<!ATTLIST base target CDATA #REQUIRED >
<!ELEMENT body (basefont? , (p )? , table )>
<!ATTLIST body alink CDATA #IMPLIED
text CDATA #IMPLIED
bgcolor CDATA #IMPLIED
link CDATA #IMPLIED
vlink CDATA #IMPLIED >
<!ELEMENT basefont EMPTY>
<!ATTLIST basefont size CDATA #REQUIRED >
<!-- generic language/style container ==============-->
<!ELEMENT a (#PCDATA )>
<!ATTLIST a %attrs;
href CDATA #IMPLIED
name CDATA #IMPLIED
target CDATA #IMPLIED >
<!ELEMENT table (tr )+>
<!ATTLIST table %attrs;
width CDATA #IMPLIED
rules CDATA #IMPLIED
frame CDATA #IMPLIED
align CDATA 'Center'
cellpadding CDATA '0'
border CDATA '0'
cellspacing CDATA '0' >
<!ELEMENT tr (td+ )>
<!ATTLIST tr %attrs; >
<!ELEMENT td (cellcontent )>
<!ATTLIST td %attrs;
bgcolor (Cyan|Lime|Black|White|Maroon ) 'White'
align CDATA 'Center'
rowspan CDATA #IMPLIED
colspan CDATA #IMPLIED >
<!ELEMENT cellcontent (%Block; | p?)+>
<!ATTLIST cellcontent cellname CDATA #REQUIRED >
<!ELEMENT h1 %Inline;>
<!ATTLIST h1 align CDATA #IMPLIED
%attrs; >
<!ELEMENT h2 %Inline;>
<!ATTLIST h2 align CDATA #IMPLIED
%attrs; >
<!ELEMENT h3 %Inline;>
<!ATTLIST h3 align CDATA #IMPLIED
%attrs; >
<!ELEMENT h4 %Inline;>
<!ATTLIST h4 align CDATA #IMPLIED
%attrs; >
<!ELEMENT h5 %Inline;>
<!ATTLIST h5 align CDATA #IMPLIED
%attrs; >
<!ELEMENT h6 %Inline;>
<!ATTLIST h6 align CDATA #IMPLIED
%attrs; >
<!ELEMENT p %Inline;>
<!ATTLIST p %attrs; >
<!-- Inline Element Declarations =================-->
<!-- Forced line break -->
<!ELEMENT br EMPTY>
<!ATTLIST br %coreattrs;
clear CDATA #REQUIRED >
<!-- Emphasis -->
<!ELEMENT em %Inline;>
<!ATTLIST em %attrs; >
<!-- Strong emphasis -->
<!ELEMENT strong %Inline;>
<!ATTLIST strong %attrs; >
<!-- Inlined quote -->
<!ELEMENT q %Inline;>
<!ATTLIST q %attrs;
cite CDATA #IMPLIED >
<!-- Subscript -->
<!ELEMENT sub %Inline;>
<!ATTLIST sub %attrs; >
<!-- Superscript -->
<!ELEMENT sup %Inline;>
<!ATTLIST sup %attrs; >
<!-- Fixed-pitch font -->
<!ELEMENT tt %Inline;>
<!ATTLIST tt %attrs; >
<!-- Italic font -->
<!ELEMENT i %Inline;>
<!ATTLIST i %attrs; >
<!-- Bold font -->
<!ELEMENT b %Inline;>
<!ATTLIST b %attrs; >
<!-- Bigger font -->
<!ELEMENT big %Inline;>
<!ATTLIST big %attrs; >
<!-- Smaller font -->
<!ELEMENT small %Inline;>
<!ATTLIST small %attrs; >
<!-- hspace, border, align, and vspace are not in the strict
XHTML standard for img. -->
<!ELEMENT img EMPTY>
<!ATTLIST img %attrs;
align CDATA #IMPLIED
border CDATA #IMPLIED
width CDATA #IMPLIED
height CDATA #IMPLIED
hspace CDATA #IMPLIED
vspace CDATA #IMPLIED
src CDATA #REQUIRED >
<!ELEMENT ul (font? , li+ )>
<!ATTLIST ul %attrs;
type CDATA 'text' >
<!ELEMENT ol (font? , li+ )>
<!ATTLIST ol type CDATA 'text'
start CDATA #IMPLIED
%attrs; >
<!ELEMENT li %Flow; >
<!ATTLIST li %attrs; >
<!--================= Form Elements===============-->
<!--Each label must not contain more than one field.
Label elements shouldn't be nested.
-->
<!ELEMENT label %Inline;>
<!ATTLIST label %attrs;
for IDREF #IMPLIED >
<!ENTITY % InputType "(text | password | checkbox |
radio | submit | reset |
file | hidden | image | button)">
<!-- The name attribute is required for all elements but
the submit and reset elements. -->
<!ELEMENT input EMPTY>
<!ATTLIST input %attrs; >
<!ELEMENT select (optgroup | option )+>
<!ATTLIST select %attrs;>
<!-- Option selector -->
<!ATTLIST select name CDATA #IMPLIED>
<!ATTLIST select size CDATA #IMPLIED>
<!ATTLIST select multiple (multiple) #IMPLIED>
<!ATTLIST select disabled (disabled) #IMPLIED>
<!ATTLIST select tabindex CDATA #IMPLIED>
<!ATTLIST select onfocus CDATA #IMPLIED>
<!ATTLIST select onblur CDATA #IMPLIED>
<!ATTLIST select onchange CDATA #IMPLIED>
<!ELEMENT optgroup (option )+>
<!ATTLIST optgroup %attrs;
disabled (disabled ) #IMPLIED
label CDATA #REQUIRED>
<!ELEMENT option (#PCDATA )>
<!ATTLIST option %attrs;
selected (selected ) #IMPLIED
disabled (disabled ) #IMPLIED
label CDATA #IMPLIED
value CDATA #IMPLIED >
<!-- Multiple-line text field -->
<!ELEMENT textarea (#PCDATA )>
<!ATTLIST textarea %attrs; >
<!ELEMENT legend %Inline;>
<!ATTLIST legend %attrs; >
<!--=================== Horizontal Rule ============-->
<!ELEMENT hr EMPTY>
<!ATTLIST hr %attrs; >
<!--=================== Block-like Quotes ==========-->
<!ELEMENT blockquote %Block;>
<!ATTLIST blockquote %attrs;
cite CDATA #IMPLIED >
<!-- The fieldset element is used to group form fields.
Only one legend element should occur in the content,
and if present it should be preceded only by white space.
-->
<!ELEMENT fieldset
(#PCDATA | legend | %block; | form | %inline; | %misc; )*>
<!ATTLIST fieldset %attrs; >
<!ELEMENT script (#PCDATA )>
<!ATTLIST script charset CDATA #IMPLIED
type CDATA #REQUIRED
src CDATA #IMPLIED
defer CDATA #IMPLIED
xml:space CDATA #FIXED 'preserve' >
<!-- Alternative content container for non-script-based
rendering -->
<!ELEMENT noscript %Block;>
<!ATTLIST noscript %attrs; >
<!ELEMENT button (#PCDATA | p | %heading; | div | %lists; |
%blocktext; | table | %special; | %fontstyle; |
%phrase; | %misc; )*>
<!ATTLIST button %attrs;
name CDATA #IMPLIED
value CDATA #IMPLIED
type (button | submit | reset ) 'submit'
disabled (disabled ) #IMPLIED
tabindex CDATA #IMPLIED
accesskey CDATA #IMPLIED
onfocus CDATA #IMPLIED
onblur CDATA #IMPLIED >
<!ELEMENT span %Inline;>
<!ATTLIST span %attrs; >
<!--The font element is not included in the XHTML standard. -->
<!ELEMENT font (b )>
<!ATTLIST font color CDATA #REQUIRED
face CDATA #REQUIRED
size CDATA #REQUIRED >
<!ELEMENT form %form.content;>
<!ELEMENT div %Flow;>
<!ATTLIST div %attrs; >
|
This might look like a completely different DTD, but it is essentially the same as the DTD we created in Chapter 4. Only one structural change has occurred: the block elements, such as the h1 element, have been moved out of the p element and now are child elements of the body element. Several elements have been added, including the form element itself and its child elements (button, label, select, and so on) and the font formatting elements, including i and b. Numerous additions have been made to the attributes, including language, id, and the scripting events.
XML documents built using this new DTD will still use a table to format and contain all of the elements that will be displayed in the browser. However, in the new DTD, the declaration for the body element is different from that in our original DTD. In our original DTD, the a (anchor) element at the top of the page is a child element of the body element. However, this element is not a child element of the body element in the XHTML standard. As we have seen, the declaration for the body element in the XHTML standard is as follows:
<!ELEMENT body %Block;> |
As we have discussed, the Block internal parameter entity is declared as follows:
<!ENTITY % Block " (%block; | form | %misc;)*"> |
Replacing %block; and %misc; results in the following code:
<!ENTITY % Block " (p | %heading; | div | %lists; |
%blocktext; | fieldset | table | form | script |
noscript)*">
|
Replacing %heading; and %blocktext; will give you the actual declaration for the body element, as shown here:
<!ENTITY % Block " (p | h1 | h2 | h3 | h4 | h5 | h6 | div | ul |
ol | hr | blockquote | fieldset | table |
form | script | noscript)*">
|
NOTE
It would be worth your time to go through the DTD and replace the entities with their actual values. You may also find it interesting to download the latest version of the XHTML standard and do all of the replacements in that document, too.
Creating this expanded declaration manually took some time, but any of the DTD tools could have done this work for you in just a few moments. For example, Figure 5-2 shows our sample XHTML DTD as it appears in XML Authority.

Figure 5-2. The Body element of the XHTML DTD displayed in XML Authority.
The child elements of the Body element are readily visible. (You can scroll down to see the complete list.)
NOTE
You do not have to include all of these child elements in your DTD to be compatible with the XHTML standard; instead, you can include only those elements that you need for your projects. If you want to be compliant with the standard, however, you cannot add elements to the body element that are not included in the standard.
Notice that the a element is not a child element of the XHTML body element; it is actually a child element of the p element. Therefore, you cannot use the declaration included in the original DTD we discussed in Chapter 4, shown here:
<!ELEMENT body (basefont? , a? , table)> |
In this declaration, the a element is a child element of the body element, which does not comply with the standard. To solve this problem, you will need to use the p element, as shown here:
<!ELEMENT body (basefont? , (p)? , table)> |
While this declaration makes the DTD conform to the XHTML standard, it also means that any of the inline elements, not just the a element, can be used in the body element as long as they are contained within a p element.
Many child elements that are included in the body element of the XHTML standard are not included in the example DTD. This is because you are using the table to hold most of the content and do not need most of these child elements. You can think of the XML documents defined by the example DTD as a subset of the XML documents defined by the more general XHTML DTD. The example DTD includes only the structure you need for your documents.
The XHTML standard declaration for the table cell element (td) is shown here:
<!ELEMENT td %Flow;> |
If you replace the Flow parameter entity and all of the parameter entities contained within %Flow; as you did earlier for the body element, your final td declaration will look like this:
<!ELEMENT td #PCDATA | p | h1|h2|h3|h4|h5|h6| div | ul | ol |
hr | blockquote | fieldset | table | form | a | br | span |
img | tt | i | b | big | small | em | strong | q | sub |
sup |input | select | textarea | label | button | script |
noscript>
|
As you can see, the Flow entity includes virtually everything. You can use a td element as a container for all of the block and inline elements, which is exactly what you want to do.
In the example DTD, the following declaration is created for the td element and the cellcontent element:
<!ELEMENT td (cellcontent)> <!ELEMENT cellcontent (%Block;)+> |
This declaration doesn't comply with the XHTML standard. The cellcontent element does not belong to the standard; it was created for marking up the text. When you use custom elements, such as the cellcontent element in this example, you will need to remove them using Extensible Stylesheet Language (XSL). Using XSL, you can transform the preceding definitions to be:
<!ELEMENT td (%Block;)+> |
This declaration will be compliant with the XHTML standard. We'll have a detailed discussion about XSL in Chapter 12.