XHTML: A Logical Merger

XHTML is an XML-based markup language that carries out the functionality of HTML in the spirit of XML. As you hopefully know by now, HTML is not a descendent of XML; this would be tricky considering that XML was created after HTML. HTML is actually a descendent of an older markup language technology known as SGML (Standard Generalized Markup Language), which is considerably more complex than XML. XML in many ways represents a simplified reformulation of SGML, which makes XML more compact than SGML, as well as much easier to learn and process. So, XML is beneficial from the perspective of both application developers and document developers. But what exactly does this have to do with HTML? Or, to pose the question more generally, why exactly do we need XHTML?

To answer the "Why XHTML?" question, you have to first take stock of the Web and some of the problems surrounding it. Much of the Web is a jumbled mess of hacked HTML code that has very little structure. Poor coding, browser leniency, and proprietary browser extensions have all combined to create a web of HTML documents that are extremely unstructured, which is a bad thing. Don't get me wrong, things have improved since the early days of the Web but we still have a long way to go. Web browser vendors have had to create HTML processors that are capable of reading and displaying even the most horrendous HTML code so that web users never have to witness the underlying bad code in many web pages. Although this is good in terms of the web experience, it makes it very difficult to glean information from web pages in an automated manner because their structure is so inconsistent.

You know that XML documents can't suffer from bad coding because XML simply won't allow it. Knowing this, a logical answer to the HTML problem is to convert web pages to XML documents and then use stylesheets to render them. It would also be nice to have peace on earth, tasty fat-free foods, and lower taxes, but life just doesn't work that way. What I'm getting at is that HTML will likely always have a place on the Web simply because it is too deeply ingrained to replace. Besides, even though XML paired with CSS/XSLT has huge structural benefits over a purely presentational HTML web page, it involves more work and a bit more planning. There are certainly situations where it doesn't matter too much if content is separated from how it is displayed, in which case HTML represents a simpler, more efficient solution.

The point I'm trying to make is that plain HTML, in one form or another, is likely here to stay. The solution to the problem then shifts to improving HTML in some way. The most logical improvement to solve the structural problems of HTML is to express HTML as an XML language (XHTML), which allows us to reap many of the benefits of XML without turning the Web on its ear. The primary benefit of XHTML is obviously structure, which would finally force browser vendors and web developers alike to play by the rules. Browsers could strictly enforce an XHTML schema to ensure that documents are both well formed and valid. Just requiring XHTML documents to be well formed would be a significant step in the right direction; checking them for validity would be the icing on the cake.

Although XHTML 2.0 is in the works, the latest supported version of XHTML is version 1.1. You can learn more about XHTML 1.1 by visiting the W3C web site at http://www.w3.org/MarkUp/#xhtml11.

Even as XHTML catches on and web developers migrate their HTML code to it, web browsers still have to support the old, unstructured versions of HTML for the foreseeable future. However, over time these legacy HTML documents could eventually be supplanted by valid, well-formed XHTML documents with plenty of structure. One thing that is already making the migration to XHTML smoother is the fact that a great deal of web page development is carried out with visual authoring tools that automatically generate XHTML code. This makes it virtually painless for developers to make the move to XHTML.

You might not realize it, but another compelling reason to move the Web toward XHTML is so that new types of compact browsers with limited processing capabilities can avoid the hassles of trying to process unstructured HTML code. These browsers are becoming prevalent on devices such as mobile phones and handheld computers, and would benefit from highly structured XHTML documents to minimize processing overhead. In Going Wireless with WML and XHTML Mobile you learn about WML and XHTML Mobile, which are used to develop web pages for mobile devices.