XML

The Significance of Notations

Unparsed entities are unable to be processed by XML applications, which means that applications have no way of knowing what to do with them unless you specify helper information that allows an application to rely on a helper application to process the entity. The helper application could be a browser plug-in or a standalone application that has been installed on a user's computer. Either way, the idea is that a notation directs an XML application to a helper application so that unparsed entities can be handled in a meaningful manner. The most obvious example of this type of handling is an external binary image entity, which could be processed and displayed by an image viewer (the helper application).

Notations are used to specify helper information for an unparsed entity and are required of all unparsed entities. Following is an example of a notation that describes the JPEG image type:

<!NOTATION JPEG SYSTEM "image/jpeg">

In this example, the name of the notation is JPEG, and the helper information is image/jpeg, which is a universal type that identifies the JPEG image format. It is expected that an XML application could somehow use this helper information to query the system for the JPEG type in order to figure out how to view JPEG images. So, this information would come into play when an XML application encounters the following image entity:

<!ENTITY pond SYSTEM "pond.jpg" NDATA JPEG>

If you didn't want to trust the XML application to figure out how to view the image on its own, you can get more specific with notations and specify an application, as follows:

<!NOTATION JPEG SYSTEM "Picasa2.exe">

This code associates Google's popular Picasa image editing application (Picasa2.exe) with JPEG images so that an XML application can use it to view JPEG images. Following is an example of what a complete XML document looks like that contains a single image as an unparsed entity:

<?xml version="1.0" standalone="no"?>
<!DOCTYPE image [
<!NOTATION JPEG SYSTEM "Picasa2.exe ">
<!ENTITY pond SYSTEM "pond.jpg" NDATA JPEG>
<!ELEMENT image EMPTY>
<!ATTLIST image source ENTITY #REQUIRED>
]>
<image source="skate" />

Although this code does all the right things in terms of providing the information necessary to process and display a JPEG image, it still doesn't work in major web browsers because none of them support unparsed entities. In truth, web browsers know that the entities are unparsed; they just don't know what to do with them. Hopefully this will be remedied at some point in the future. Keep in mind, however, that although web browsers may not support unparsed entities, plenty of other XML applications and tools do support them.

Working with CDATA

Just as an XML processor doesn't process unparsed entities, you can deliberately mark content within an XML document so that it isn't processed. This type of content is known as unparsed character data, or CDATA. CDATA in a document must be specially marked so that it is treated differently than the rest of an XML document. For this reason, the part of a document containing CDATA is known as a CDATA section. You define a section of CDATA code by enclosing it within the symbols <![CDATA[ and ]]>. Following is an example of a CDATA section, which should make the usage of these symbols a little clearer:

This is my self-portrait:
<![CDATA[
   *****
  * @ @ *
  *  )  *
  * ~~~ *
   *****
]]>

In this example, the crude drawing of a face is kept intact because it isn't processed as XML data. If it wasn't enclosed in a CDATA section, the white space within it would be processed down to a few spaces, and the drawing would be ruined. CDATA sections are very useful any time you want to preserve the exact appearance of text. You can also place legitimate XML code in a CDATA section to temporarily disable it and keep it from being processed.