XML

Putting Attributes to Work

Attributes go hand in hand with elements and are incredibly important to the construction of DTDs. Attributes are used to specify additional information about an element. More specifically, attributes are used to form a name/value pair that somehow describes a particular property of an element. Attributes are declared in a DTD using attribute list declarations, which take the following form:

<!ATTLIST ElementName AttrName AttrType Default>

This form reveals that an attribute has a name (AttrName) and a type (AttrType), as well as a default value (Default). The default value for an attribute refers to either a value or a symbol that indicates the use of the attribute. There are four different types of default values you can specify for an attribute in Default:

  • #REQUIRED The attribute is required.

  • #IMPLIED The attribute is optional.

  • #FIXED value The attribute has a fixed value.

  • default The default value of the attribute.

The #REQUIRED value identifies a required attribute, which means the attribute must be set if you use the element. The #IMPLIED value identifies an optional attribute, which means the attribute is optional when using the element. The #FIXED attribute is used to assign a fixed value to an attribute, effectively making the attribute a constant piece of information; you must provide the fixed attribute value after the #FIXED symbol when declaring the attribute. The last option for declaring attribute defaults is to simply list the default value for an attribute; an attribute will assume its default value if it isn't explicitly set in the element. Following is an example of an attribute list for an element that specifies the units for a duration of time:

<!ELEMENT distance (#PCDATA)>
<!ATTLIST distance units (miles | kilometers | laps) "miles">

In this example, the element is named distance and its only attribute is named units. The units attribute can only be set to one of three possible values: miles, kilometers, or laps. The default value of the units attribute is miles, which means that if you don't explicitly set the attribute it will automatically take on a value of miles.

By the Way

Although attribute lists don't have to be declared in a particular place within a DTD, it is common practice to place them immediately below the declaration for the element to which they belong.


In addition to the default value of an attribute value, you must also specify the type of the attribute in the attribute list declaration. There are 10 different attribute types, which follow:

  • CDATA Unparsed character data

  • Enumerated A series of string values

  • NOTATION A notation declared somewhere else in the DTD

  • ENTITY An external binary entity

  • ENTITIES Multiple external binary entities separated by whitespace

  • ID A unique identifier

  • IDREF A reference to an ID declared somewhere else in the DTD

  • IDREFS Multiple references to IDs declared somewhere else in the document

  • NMTOKEN A name consisting of XML token characters (letters, numbers, periods, dashes, colons, and underscores)

  • NMTOKENS Multiple names consisting of XML token characters

To help in understanding these attribute types, it's possible to classify them into three groups: string, enumerated, and tokenized.

String attributes are the most commonly used attributes and are represented by the CDATA type. The CDATA type indicates that an attribute contains a simple string of text. Following is an example of declaring a simple CDATA attribute that must be defined in the education element:

<!ATTLIST education school CDATA #REQUIRED>

In this example, the school a person attended is a required character data attribute of the education element. If you wanted to make the school attribute optional, you could use the #IMPLIED symbol:

<!ATTLIST education school CDATA #IMPLIED>

Enumerated attributes are constrained to a list of predefined strings of text. The enumerated type is similar to the CDATA type except the acceptable attribute values must come from a list that is provided in the attribute list declaration. Following is an example of how you might provide an enumerated attribute for specifying the type of degree earned as part of the education element:

<!ATTLIST education degree (associate | bachelors | masters | doctorate)
  "bachelors">

When using the degree attribute in a document, you are required to select a value from the enumerated list. If you don't use the attribute at all, it will assume the default value of bachelors.

Tokenized attributes are processed as tokens by an XML application, which means the application converts all contiguous whitespace to a single space character and eliminates all leading and trailing whitespace. In addition to eliminating the majority of whitespace in a tokenized attribute value, the XML application also validates the value of a tokenized attribute based upon the declared attribute type: ENTITY, ENTITIES, ID, IDREF, IDREFS, NMTOKEN, or NMTOKENS.

By the Way

A token is the smallest piece of information capable of being processed by an XML application. A tokenized attribute is simply an attribute that is processed into tokens by an XML application, which has the effect of eliminating extraneous whitespace (space characters, newline characters, and so on). Contrast this with a string attribute, which goes unprocessed, and therefore retains all of its whitespace.


The ENTITY and ENTITIES types are used to reference entities, which you learn about in the next tutorial. As an example, images are typically referenced as binary entities, in which case you use an ENTITY attribute value to associate an image with an element type:

<!ATTLIST photo image ENTITY #IMPLIED>

The ENTITIES type is similar to ENTITY but it allows you to specify multiple entities. The ID, IDREF, and IDREFS attribute types all relate to unique identifiers. The ID type is a unique identifier that can be used to uniquely identify an element within a document:

<!ATTLIST part id ID #REQUIRED>

Only one attribute of type ID may be assigned to a given element type. The NMTOKEN and NMTOKENS attribute types are used to specify attributes containing name token values. A name token value consists of a single name, which means that it can't contain whitespace. More specifically, a name token value can consist of alphanumeric characters in addition to the following characters: ., -, _, and :.

Working with Multiple Attributes

I've only shown you example of individual attributes thus far, but you'll likely create elements that rely on several attributes. You can list all of the attributes for an element in a single attribute list by listing the attributes one after the next within the attribute list declaration. Following is an example of declaring multiple attributes within a single attribute list:

<!ELEMENT photo (image, format)>
<!ATTLIST photo
  image ENTITY #IMPLIED
  photo format NOTATION (gif | jpeg) #REQUIRED
>

This example shows how the two attributes of the photo element, image and photo, are declared in a single attribute list declaration.