Categories
PHP

Validating HTML and XML

How to validate XML and HTML files (or strings) with PHP’s DOMDocument class.

PHP can validate XML against three types of files: Document Type Definitions (DTDs), Schemas (.xsd), and relaxNG:

  1. Validating Document Based on Document Type Definitions (DTDs)
    • validate() validates the document based on its DTD.
  2. Validating Document Based on Schemas (.xsd)
    • schemaValidate(‘schema.xsd’) validates a document based on the given schema file.
    • schemaValidateSource(‘schema_str’) validates a document based on a schema defined in the given string.
  3. Validating Document Based on relaxNG
    • relaxNGValidate(‘file.rng’) performs validation on the document based on the given RNG schema.
    • relaxNGValidateSource(‘relax_ng_str’) performs validation on the document based on the given RNG source. (See https://relaxng.org/)

Validating HTML (and XML) Based on its DTD

If an XML document specifies a DTD at the top, call DOMDocument::validate() to validate it against the DTD. The validate() automatically looks up the name of the DTD file in the XML document.

Example: Validating an HTML Page

<?php
 $dom = new DOMDocument;
 $dom->loadHTMLFile('https://brainbell.com/');

 if( ! $dom->validate() ) {
  die ('Invalid XML document');
 }

We used DOMDocument::loadHTMLFile method which loads HTML from a file (or URL).

You can use libxml functions to show errors by column and line number, see Handling Errors While Parsing XML.

Example: Validating HTML & Parsing Errors

<?php
 libxml_use_internal_errors(true);
 $dom = new DOMDocument;
 $dom->loadHTMLFile('https://brainbell.com/');
 $errorObj = libxml_get_errors();
 
 if (!$dom->validate()) {
  foreach ( $errorObj as $error ) {
   switch ( $error->level ) {
   case LIBXML_ERR_FATAL:
    echo "Fata Error: ";
    break;
   case LIBXML_ERR_ERROR:
    echo "Error: ";
    break;
   case LIBXML_ERR_WARNING:
    echo "Warning: ";
    break;
   }
   echo $error->code .'<br>'.
       'Message: ' . $error->message .'<br>'.
       'Line: ' . $error->line .'<br>'.
       'Column: ' . $error->column .'<br>'.
       'File/URL: ' . $error->file .'<hr>';
  }
  libxml_clear_errors();
 }

Validating XML Against Schema

The schemaValidate() method takes the name and path to the schema file as an argument while the schemaValidateSource() method takes schema as a string. Both methods return false if the XML does not match the rules laid down in the Schema.

Example: Validating a nonmatching schema (.xsd):

<?php
 $dom = new DOMDocument;
 $dom->load('sample.xml');
 if ( $dom->schemaValidate('sample.xsd') )
  echo 'Validation succeeded';
 else
  echo 'Validation failed';

The preceding code prints the following information as the XML document not matched the schema file:

Warning: DOMDocument::schemaValidateSource(): Element 'users': No matching global declaration available for the validation root. in D:\xampp\htdocs\example.php on line 35
Validation failed

Example: Validating a matching schema string with :

<?php
 $xml = '<?xml version="1.0"?>
<quotes>
 <quote year="2023">
  <coding>Lorem ipsum dolor...</coding>
  <author>Author XYZ</author>
 </quote>
 <quote year="2022">
  <coding>Lorem ipsum dolor...</coding>
  <author>Author ABC</author>
 </quote>
</quotes>';

 $xsd = '<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <xsd:element name="quotes">
  <xsd:complexType>
   <xsd:sequence>
    <xsd:element name="quote" type="quoteType" minOccurs="0" maxOccurs="unbounded"/>
   </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
  <xsd:complexType name="quoteType">
   <xsd:sequence>
    <xsd:element name="coding" type="xsd:string"/>
    <xsd:element name="author" type="xsd:string"/>
   </xsd:sequence>
   <xsd:attribute name="year" type="xsd:gYear" use="required"/>
  </xsd:complexType>
 </xsd:schema>';
 
 $dom = new DOMDocument;
 $dom->loadXML($xml);
 if ( $dom->schemaValidateSource($xsd) )
  echo 'Validation succeeded';
 else
  echo 'Validation failed';

 // Validation succeeded

Sample XSD File:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <xsd:element name="quotes">
  <xsd:complexType>
   <xsd:sequence>
    <xsd:element name="quote" type="quoteType" minOccurs="0" maxOccurs="unbounded"/>
   </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
  <xsd:complexType name="quoteType">
   <xsd:sequence>
    <xsd:element name="coding" type="xsd:string"/>
    <xsd:element name="author" type="xsd:string"/>
   </xsd:sequence>
   <xsd:attribute name="year" type="xsd:gYear" use="required"/>
  </xsd:complexType>
 </xsd:schema>

See A Complete XML Schema Example.

Validating XML Against relaxNG

The following code uses relaxNGValidate() to validate a (well-formed) XML file against a nonmatching relaxNG file.

<?php
  $dom = new DOMDocument;
  $dom->load('sample.xml');
  if ( $dom->relaxNGValidate('sample.rng') )
   echo 'Validation succeeded';
  else
   echo 'Validation failed';

/* Warning: DOMDocument::relaxNGValidate(): Expecting element quotes, got users in D:\xampp\htdocs\example.php on line 4
Validation failed */

Creating a relaxNG file can be quite difficult; the Java tool Trang, available at https://relaxng.org/jclark/trang.html, can read an XML file and create a relaxNG, Schema, or DTD file out of it.

Example: Validating a nonmatching relaxNG string with relaxNGValidateSource:

<?php
 $xml = '<?xml version="1.0"?>
 <users>
  <user>
   <name>BrainBell.com</name>
   <email>admin@brainbell.com</email>
  </user>
  <user>
   <name>Fast-Tutorials.com</name>
   <email>admin-fast-tutrials@outlook.com</email>
  </user>
 </users>';

 $rng = '<?xml version="1.0" encoding="UTF-8"?>
 <element name="quotes" xmlns="http://relaxng.org/ns/structure/1.0">
  <zeroOrMore>
   <element name="quote">
    <optional>
     <attribute name="year"/>
    </optional>
    <element name="coding">
     <text/>
    </element>
    <element name="author">
     <text/>
    </element>
   </element>
  </zeroOrMore>
 </element>';

 $dom = new DOMDocument;
 $dom->loadXML($xml);
 if ( $dom->relaxNGValidateSource($rng) )
  echo 'Validation succeeded';
 else
  echo 'Validation failed';

The preceding code prints “Validation failed” message and may show a warning message (depending on your error reporting settings):

Warning: DOMDocument::relaxNGValidateSource(): Expecting element quotes, got users in D:\xampp\htdocs\example.php on line 33
Validation failed

Example: Validating a matching relaxNG string with relaxNGValidateSource:

<?php
 $xml = '<?xml version="1.0"?>
 <quotes>
  <quote year="2023">
   <coding>Lorem ipsum dolor...</coding>
   <author>Author XYZ</author>
  </quote>
  <quote year="2022">
   <coding>Lorem ipsum dolor...</coding>
   <author>Author ABC</author>
  </quote>
 </quotes>';

 $rng = '<?xml version="1.0" encoding="UTF-8"?>
 <element name="quotes" xmlns="http://relaxng.org/ns/structure/1.0">
  <zeroOrMore>
   <element name="quote">
    <optional>
     <attribute name="year"/>
    </optional>
    <element name="coding">
     <text/>
    </element>
    <element name="author">
     <text/>
    </element>
   </element>
  </zeroOrMore>
 </element>';

 $dom = new DOMDocument;
 $dom->loadXML($xml);
 if ( $dom->relaxNGValidateSource($rng) )
  echo 'Validation succeeded';
 else
  echo 'Validation failed';

The preceding code prints “Validation succeeded”.

A sample.rng file:

<?xml version="1.0" encoding="UTF-8"?>
 <element name="quotes" xmlns="http://relaxng.org/ns/structure/1.0">
  <zeroOrMore>
   <element name="quote">
    <optional>
     <attribute name="year"/>
    </optional>
    <element name="coding">
     <text/>
    </element>
    <element name="author">
     <text/>
    </element>
   </element>
  </zeroOrMore>
 </element>

Visit https://php.net/manual/domdocument.relaxngvalidate.php.


Using XML: