XML

Creating Your Own Markup Languages

Before you get too far into this tutorial, I have to make a little confession. When you create an XML document, you aren't really using XML to code the document. Instead, you are using a markup language that was created in XML. In other words, XML is used to create markup languages that are then used to create XML documents. The term "XML document" is even a little misleading because the type of the document is really determined by the specific markup language used. So, as an example, if I were to create my very own markup language called MML (Michael's Markup Language), then the documents I create would be considered MML documents, and I would use MML to code those documents. Generally speaking, the documents are still XML documents because MML is an XML-based markup language, but you would refer to the documents as MML documents.

The point of this discussion is not to split hairs regarding the terminology used to describe XML documents. It is intended to help clarify the point that XML is a technology that enables the creation of custom markup languages. If you're coming from the world of HTML, you probably think in terms of there being only one markup languageHTML. In the XML world, there are thousands of different markup languages, with each of them applicable to a different type of data. As an XML developer, you have the option of using an existing markup language that someone else created using XML, or you can create your own. An XML-based markup language can be as formal as XHTML, the version of HTML that adheres to the rules of XML, or as informal as my simple Tall Tales trivia language.

When you create your own markup language, you are basically establishing which elements (tags) and attributes are used to create documents in that language. Not only is it important to fully describe the different elements and attributes, but you must also describe how they relate to one another. For example, if you are creating a markup language to keep track of sports information so that you can chart your local softball league, you might use tags such as <schedule>, <game>, <team>, <player>, and so on. Examples of attributes for the player element might include name, hits, rbis, and so on.

Just in case you're thinking of creating your own sports markup language, I might be able to save you some time by directing you to SportsML (Sports Markup Language). This markup language has elements and attributes similar to the ones I described for your hypothetical softball markup language, except SportsML is much broader and covers many different sports. For more information regarding SportsML, please visit the SportsML web site at http://www.sportsml.org/.

The question you might now be asking yourself is how exactly do you create a markup language? In other words, how do you specify the set of elements and attributes for a markup language, along with how they relate to each other? Although you could certainly create sports XML documents using your own elements and attributes, there really needs to be a set of rules somewhere that establishes the format and structure of documents created in the language. This set of rules is known as the schema for a markup language. A schema describes the exact elements and attributes that are available within a given markup language, along with which attributes are associated with which elements and the relationships between the elements. You can think of a schema as a legal contract between the person who created the markup language and the person who will create documents using that language.

Although I describe a schema as a "legal contract," in reality there is nothing legal about schemas. The point is that schemas are very exacting and thorough, and leave nothing to chance in terms of describing the makeup of XML documents this degree of exacting thoroughness is what we all look for in an ideal legal contract.