XML

XML Schema Construction Basics

In a DTD you lay out the elements and attributes that can be used to describe a particular type of data. Similar to a DTD, XML Schema allows you to create markup languages by carefully describing the elements and attributes that can be used to code information. Unlike DTDs, schemas created with XML Schema are coded in XML, which makes them more consistent in terms of keeping everything in the XML domain; if you recall, DTDs use their own cryptic language. The language used to describe markup languages in XML Schema is XSD. Schemas created in this language are often referred to simply as XSDs.

The XSD language is an XML-based language, which means you use XML elements and attributes to describe the structure of your own custom markup languages. This means that XSD itself was created in XML. Although this might seem confusing at first, keep in mind that it is necessary for there to be a means of validating XSD documents, which means the XSD language must be spelled out in terms of XML. More specifically, the elements and attributes in the XSD language are described in none other than a DTD. This is because it isn't exactly possible to use XSD to describe the XSD schema. Admittedly, this is a "chicken and egg" kind of problem because we're talking about creating a schema for a schema language that is in turn used to create schemas. Which one comes first? To be honest, it really doesn't matter. Rather than confuse you further, I'd rather push on and learn how an XSD document comes together. The main point here is that XSD is an XML-based markup language, similar in many ways to any other custom markup language you might create.

Because XSD schema documents are really just XML documents, you must include the familiar XML declaration at the start of them:

<?xml version="1.0"?>

After entering the XML declaration, you're ready to start coding the XSD document. All of the elements and attributes in XSD are part of what is known as a namespace, which if you recall from Tutorial 5, "Putting Namespaces to Use," is essentially a grouping of elements and attributes that guarantees uniqueness in their names. You typically assign a namespace a prefix that is used throughout a document to reference elements and attributes within the namespace. In order to reference XSD elements and attributes, you must first declare the XSD namespace in the root element of the XSD document. The prefix of the XSD namespace is typically set to xsd, which means that all XSD elements and attributes are preceded by the prefix xsd and a colon (:). The root element of XSD documents is named xsd:schema. Following is an example of how you declare the XSD namespace in the xsd:schema element:

<xsd:schema xmlns:xsd=" http://www.w3.org/2001/XMLSchema">

In this code, the xmlns:xsd attribute is used to set the XSD namespace, which is a standard URI made available by the W3C. This means that you must precede each element and attribute name with xsd:. So, to recap, the general structure of an XSD schema document has the following form:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd=" http://www.w3.org/2001/XMLSchema">
</xsd:schema>

Of course, this code has no content within the root element, so it isn't doing much. However, it lays the groundwork for the basis of all XSD schema documents.

XSD Data Types

The XSD language is defined by the elements and attributes that can be used within it, as well as their relationship to one another. At the heart of XSD are data types, which determine the type of data that can be represented by a particular piece of markup code. For example, numeric data in XSD is coded differently than text data and therefore has an associated data type that is used when creating a schema with XSD. There are two different general types of data used in XSDs: simple data and complex data. Simple data corresponds to basic pieces of information such as numbers, strings of text, dates, times, lists, and so on. Complex data, on the other hand, represents more involved information such as mixed elements and sequences of elements. Generally speaking, complex data types are built upon simple data types.

Simple data types can be used with both elements and attributes and provide a means of describing the exact nature of a piece of information. The xsd:element element is used to create elements of a simple type, whereas the xsd:attribute element is used to create attributes. Following are a few examples of each:

<xsd:element name="name" type="xsd:string"/>
<xsd:element name="title" type="xsd:string"/>
<xsd:element name="occupation" type="xsd:string"/>
<xsd:attribute name="birthdate" type="xsd:date"/>
<xsd:attribute name="weight" type="xsd:integer"/>

Although these examples show how simple data types enter the picture with elements and attributes, they don't reveal the relationship between elements and attributes, which is critical in any XSD document. These relationships are established by complex data types, which are capable of detailing the content models of elements. Following is an example of how simple data types can be used within a complex type to describe the content model of an element named person:

<xsd:element name="person">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="name" type="xsd:string"/>
      <xsd:element name="title" type="xsd:string"/>
      <xsd:element name="occupation" type="xsd:string"/>
    </xsd:sequence>
    <xsd:attribute name="birthdate" type="xsd:date"/>
    <xsd:attribute name="weight" type="xsd:integer"/>
  </xsd:complexType>
</xsd:element>

Keep in mind that this XSD code describes a custom markup language that is used to create XML documents. In order to fully understand how the schema code works, it's a good idea to take a look at what XML code might look like that adheres to the schema. Following is an example of some XML document data that follows the data structure laid out in the prior XSD schema code:

<person birthdate="1969-10-28" weight="160">
  <name>Milton James</name>
  <title>Mr.</title>
  <occupation>mayor</occupation>
</person>

This code should look much more familiar to you as it is basic XML code with custom elements and attributes. It doesn't take too much analysis to see that this code adheres to the XSD schema code you just saw. For example, the person element includes two attributes, birthdate and weight, as well as three child elements: name, title, and occupation. Unlike a DTD, the schema is able to carefully describe the data type of each element and attribute. For example, the birthdate attribute is a date (xsd:date), not just a string that happens to store a date, and the weight attribute is an integer number (xsd:integer).

XSD Schemas and XML Documents

You now have a basic knowledge of how a schema is used to establish a markup language that in turn is used to create XML documents. What you don't know is how a schema is actually associated with such documents. If you recall, a DTD is associated with a document by way of a document type declaration. XSDs don't rely on a document type declaration and instead use a special attribute called noNamespaceSchemaLocation. To associate a schema with an XML document for validation purposes, you set this attribute of the root element to the location of the schema document. However, in order to use this attribute you must first declare the namespace to which it belongs. Following is how this is accomplished in XML code:

<contacts xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="contacts.xsd">
  <person birthdate="1969-10-28" weight="160">
    <name>Milton James</name>
    <title>Mr.</title>
    <occupation>mayor</occupation>
  </person>
</contacts>

There is also a schemaLocation attribute for referencing a schema that has its own namespace. This is useful if you want to explicitly reference elements using a prefix for the schema. You find out more about this attribute later in the lesson.

This code shows how to declare the appropriate namespace and then set the noNamespaceSchemaLocation attribute for the schema document. Assuming the schema for the contacts document is located in the file named contacts.xsd, this XML document is ready for validation. This brings up an important point regarding schema documentsthey are coded in XML but they are stored in files with a .xsd extension. This makes it possible to determine quickly if a file is an XSD schema.

Many XML documents are stored in files with extensions other than .xml. Although .xml is certainly a suitable extension for any XML document, it is generally better to use the more specific extension dictated by the markup language, assuming that such an extension exists. As an example, in the previous tutorial you worked with SVG documents that were stored in files with a .svg extension.