The WHATWG is developing HTML5 and XHTML5 as successors for HTML 4.01 and XHTML 1.0. An (X)HTML5 conformance checker is expected to take the role that DTD-based validators have had with earlier (X)HTML. Conformance checking goes beyond the capabilities of DTDs.
The WHATWG does not prescribe an implementation strategy for conformance checkers and does not endorse schema languages. Realizing that no schema language is adequate for describing the conformance requirements for (X)HTML5, a mainly RELAX NG-based implementation approach was chosen nonetheless.
The bulk of the (X)HTML5 language is described as a RELAX NG schema that is supported by a custom datatype library written in Java. A Schematron schema is used alongside RELAX NG for enforcing constraints for which RELAX NG is not suitable. The remaining requirements are enforced by custom code written in Java. For checking HTML5, a special-purpose parser was developed so that the XML tools can work on XHTML5-like parse events.
The design of the system and the experience gained so far in the ongoing project are discussed. The ease of expressing and changing the grammar is identified as the main benefit of RELAX NG. The inability to easily fine-tune error messages is identified as a drawback.
Henri Sivonen is a software developer based in Helsinki, Finland. His most notable current professional activity is developing an HTML5 conformance checking service as an independent contractor consulting for the Mozilla Corporation—a project he wrote his master’s thesis on. Henri participates in the HTML Working Group of the W3C as a representative of the Mozilla Foundation. His has previously contributed to the development of the Atom syndication format.