Microsoft Excel

Load an XML Document into Excel

We'll start with a sample XML document, shown in following Example.

Example. A simple XML document for analysis in Excel
<?xml version="1.0" encoding="UTF-8"?>
<sales>
<sale>
<date>2003-10-05</date>
<ISBN>0596005385</ISBN>
<Title>Office 2003 XML Essentials</Title>
<PriceUS>34.95</PriceUS>
<quantity>200</quantity>
<customer ID="1025">Books</customer>
</sale>
<sale>
<date>2003-10-05</date>
<ISBN>0596002920</ISBN>
<Title>XML in a Nutshell, 2nd Edition</Title>
<PriceUS>39.95</PriceUS>
<quantity>90</quantity>
<customer ID="1025">Books</customer>
</sale>
<sale>
<date>2003-10-05</date>
<ISBN>0596002378</ISBN>
<Title>SAX2</Title>
<PriceUS>29.95</PriceUS>
<quantity>300</quantity>
<customer ID="1025">Books</customer>
</sale>
<sale>
<date>2003-10-05</date>
<ISBN>0596005385</ISBN>
<Title>Office 2003 XML Essentials</Title>
<PriceUS>34.95</PriceUS>
<quantity>10</quantity>
<customer ID="1029">Books of Glory</customer>
</sale>
<sale>
<date>2003-10-05</date>
<ISBN>0596002920</ISBN>
<Title>XML in a Nutshell, 2nd Edition</Title>
<PriceUS>39.95</PriceUS>
<quantity>25</quantity>
<customer ID="1029">Books of Glory</customer>
</sale>
<sale>
<date>2003-10-07</date>
<ISBN>0596002378</ISBN>
<Title>SAX2</Title>
<PriceUS>29.95</PriceUS>
<quantity>5</quantity>
<customer ID="1029">Books of Glory</customer>
</sale>
<sale>
<date>2003-10-18</date>
<ISBN>0596002378</ISBN>
<Title>SAX2</Title>
<PriceUS>29.95</PriceUS>
<quantity>15</quantity>
<customer ID="2561">Title Wave</customer>
</sale>
<sale>
<date>2003-10-21</date>
<ISBN>0596002920</ISBN>
<Title>XML in a Nutshell, 2nd Edition</Title>
<PriceUS>39.95</PriceUS>
<quantity>15</quantity>
<customer ID="9021">Books for You</customer>
</sale>
</sales>

You can open this directly from Excel 2003 by selecting File » Open.... The dialog box shown in the figure will appear.

Figure. Opening an XML file in Excel 2003
figs/exhk_0801.gif

If you select "As an XML list," you'll first be warned that Excel will be creating its own schema for this schema-free document, as shown in the figure. After clicking OK, you'll be rewarded with Excel's best guess as to how to present the information in the document as a spreadsheet, as shown in the figure next.

Figure. Excel 2003 warning about lack of schema references
figs/exhk_0802.gif
Figure. XML data presented as an XML list in Excel 2003
figs/exhk_0803.gif

Note that Excel expected the date format used by the date element, so it is now displaying dates imported as 2003-10-05 as 10/5/2003.

Once you have the document loaded into Excel, you can treat the data much like you would any other data in Excel-incorporating it into formulas, creating named ranges for it, making charts based on its contents, etc. To help you, Excel provides some built-in functionality for analyzing the data. The drop-down boxes in the column headers enable you to choose how to sort the data (the default is the order the document had originally). You also can turn on a total row, either from the List toolbar or by right-clicking anywhere on the list and selecting List » Total Row from the pop-up menu. Once the total row appears, you can choose what kind of total you prefer from the drop-down menu displayed in the figure.

Figure. Choosing totals for an XML list in Excel 2003
figs/exhk_0804.gif

You also can refresh the data, updating that area with information from an XML document with the same structure. If you had another document with the same structure, you could right-click the list, select XML » Import... from the pop-up menu, and choose a different document. With more data, it might look like the figure.

Figure. The same XML list with updated data
figs/exhk_0805.gif

In addition, you can export the data back to an XML file after editing it by right-clicking the list and selecting XML » Export... from the pop-up menu. This makes Excel a very convenient editing tool for simple XML documents with tabular structures.

For simple data, you can usually trust Excel to guess what a file's contents are and use the default view it provides. As data gets more complicated, however, especially if it contains dates or text that looks like numbers (note the missing initial zeros on the ISBNs in the preceding figures!), you might want to use XML schemas to constrain how Excel reads your data and what kinds of data will fit in a given map. For this document, an XML schema might look like exmaple.

Example. A schema for the book sales data
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="sales">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" ref="sale"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="sale">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="date"/>
        <xs:element ref="ISBN"/>
        <xs:element ref="Title"/>
        <xs:element ref="PriceUS"/>
        <xs:element ref="quantity"/>
        <xs:element ref="customer"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="date" type="xs:date"/>
  <xs:element name="ISBN" type="xs:string"/>
  <xs:element name="Title" type="xs:string"/>
  <xs:element name="PriceUS" type="xs:decimal"/>
  <xs:element name="quantity" type="xs:integer"/>
  <xs:element name="customer">
    <xs:complexType mixed="true">
      <xs:attribute name="ID" use="required" type="xs:integer"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

Note that the date element is defined as a date, and the ISBN element is defined as a string here, not an integer. If you start by opening this schema rather than the document, you can have Excel load the document while preserving the initial zero on the ISBNs. This time, you'll create a list before loading the XML document, starting from a blank worksheet.

You'll need to open the XML Source task pane. If the task pane isn't open already, open it by selecting View » Task Pane or by pressing Control-F1. Then select XML Source from the drop-down box at the top of the task pane, and you'll see something such as that shown in the figure.

Figure. The XML Source task pane
figs/exhk_0806.gif

To load the schema, click the XML Maps... button. This will bring up the XML Maps dialog box, shown in the figure.

Figure. The XML Maps dialog box
figs/exhk_0807.gif

Click the Add... button to open the schema, and select the schema as shown in the figure.

Figure. Selecting an XML schema
figs/exhk_0808.gif

If your schema doesn't restrict documents to having only one possible starting element, Excel will ask you which element to use as the root element, as shown in the figure. Because the documents in this example start with the sales element, pick "sales."

Figure. Selecting a root element for the schema
figs/exhk_0809.gif

After you click OK, Excel warns about possible difficulties it might have in interpreting schemas in the dialog box shown in the figure. XML Schema is an enormous specification that supports a wide variety of structures that don't fit well with Excel's perspective on information, so Excel has some limitations.

Figure. Warning label for schema processing
figs/exhk_0810.gif

Excel will show that your schema has been added to the spreadsheet in the XML Maps dialog, which should look like that shown in the figure.

Figure. A schema loaded as an XML map
figs/exhk_0811.gif

If you click OK, you'll be returned to the main Excel interface, and the XML Source task pane will be populated with a diagram of the structure the schema described, such as that shown at the right of the figure. Now that you have the structure, you can lay out the list. The easiest way to do this-especially with a small document such as this one-is to drag the sales icon to cell A1, producing the result in figure.

Figure. An XML list created from the schema information in the XML Source task pane
figs/exhk_0812.gif

You also can drag items over individually, if you want to change the order or want to put different pieces in different places on the spreadsheet.

Now that you have a home for the data, it's time to populate it. You can either click the Import XML Data button on the list toolbar, or right-click the list and select XML » Import. If you choose the file you used earlier, you'll see a result such as that shown in the figure. Note the addition of the leading zeros to the ISBNs, which are now text, as they should be.

Figure. The XML list, populated with data
figs/exhk_0813.gif

Excel's support for XML maps and lists means you can create spreadsheets that work on data arriving in separate files with more flexibility than prior formats such as CSV or tab-delimited formats provided. Instead of having to be connected to a database to edit data interactively, a user can edit XML files while on an airplane and feed that XML to an appropriate consumer when he lands. Perhaps the best aspect of Excel's new XML features is their flexibility; so long as the data is in a structure that fits on a grid, Excel has very few rules about what kinds of XML it will accept. With a few mouse clicks and no programming, you can integrate XML data with your spreadsheets.