Databases and XML : XML

Although XML is commonly thought of as a document format, it's also very popular as a format for data storage. Many applications use XML files to store their configuration, as well as relying on remote procedure calling services like XML-RPC and SOAP to format the messages that they exchange using XML.

The fact that XML is highly structured and can be tested to ensure that it's both well-formed and valid in a standardized, programatic fashion takes a lot of the burden of reading and modifying the data file off of the application developer when he or she is writing a program.

Let's look at a couple of real world examples where XML might need to be integrated with a relational database. The structured nature of XML makes it a good choice to use as a data interchange format. Let's say that a company periodically receives inventory information from a supplier. That information might be stored in an Oracle database on a server in the supplier's system but might need to be imported into an Access database when the company receives it. XML would make a good intermediate format for the data because it's easy to write programs that import and export the data and because, by using XML, the data can be used in future applications that require it as well.

Another example might be a service that syndicates news articles. The news articles could be distributed via XML files so that they could easily be transformed for presentation on the Web, or they could be imported into a relational database and published from there.

Incidentally, there already exists an XML language for storing news articles in XML documents. You learn a great deal more about XML and how it can be used to code news articles in Syndicating The Web With RSS News Feeds, "Syndicating the Web with RSS News Feeds."

Resolving XML Data into Database Tables

The question you face when you integrate applications that use XML for data storage with relational databases is the degree to which you want to take advantage of the features of the relational database. If you simply insert entire XML documents into the database, you can't use advanced SQL features to retrieve specific bits of information from the XML documents.

Here's an XML document that is used to store information related to automobiles:

<dealership>
  <automobile make="Buick" model="Century" color="blue">
    <options>
      <option>cruise control</option>
      <option>CD player</option>
    </options>
  </automobile>
  <automobile make="Ford" model="Thunderbird" color="red">
    <options>
      <option>convertible</option>
      <option>leather interior</option>
      <option>heated seats</option>
    </options>
  </automobile>
</dealership>

Now, let's look at how you might design a database to store this information. As I mentioned earlier, the path of least resistance is just to stick the whole XML document in a field. However, that probably isn't a good idea for this file because it contains more than one automobile "record."

Instead, let's look at what a database schema for the information in the XML file would look like. A diagram of the schema appears in Figure 19.2.

Figure 19.2. The schema that corresponds to the automobiles example XML document.

As you can see, I turned the XML document into two tables, automobiles and options. The automobiles table contains all the information stored in the attributes of the automobile tag in the XML document. Because automobiles have a one-to-many relationship to options, I created a separate table for them. In the options table, id_automobiles is a foreign key that relates back to a specific automobile in the automobiles table.

To make sure you understand why the automobile options were broken out into a separate database table, consider that the number of options for a single automobile can vary from one automobile to the next. This is a scenario where a single database field in the automobiles table can't account for a varying amount of data; hence the one-to-many relationship. Therefore, the solution is to break out the options into a separate table where each row is tied back to a specific automobile. Then you can add as many options as you want for one automobile as long as each option includes the appropriate automobile ID.

Storing XML Documents in a Database

If you're storing entire XML documents in a database, you don't need to worry about translating the XML document format into a tabular database structure. Instead, you just need to extract the information from the document that you need to use in the relational database world and create columns for that. As an example, if you store newspaper articles as XML documents, the section, headline, author, body, and perhaps more information will all be included in the XML document within their own tags. It is then possible to process the XML code to access each portion of the document.

If you store those documents in a database and plan on publishing them on the Web from that database, you may want to consider breaking up the XML data so that it can be retrieved more easily. For example, you might want separate columns for the section and writer so that you can write simple SQL statements that retrieve the documents based on those values. Either way, you would be retrieving XML code from the database, which is far different than the earlier automobile example where the database data has been translated from XML into pure database content.