Formalising the Proximate Semantics of XML Languages with UML, OWL and GRDDL

  • Henry S. Thompson, Language Technology Group, HCRC, School of Informatics, University of Edinburgh, World Wide Web Consortium, Markup Technology, Ltd., ht@inf.ed.ac.uk

Abstract

Many XML languages are defined in two steps, the first in terms of a mapping from XML documents to an abstract data model, the second by defining the meaning of the constituents of the abstract data model with respect to some domain. One obvious example is (X)HTML+CSS, where the first step is from document to nested boxes with properties, and the second is a set of claims that boxes+properties make on renderings. Another is W3C XML Schema, which explicitly separates the mapping from schema documents to schemas on the one hand from the schema-validation semantics of the schema components which make up schemas on the other.

This paper describes a novel approach to stating what it calls the proximate semantics of an XML language, that is, the mapping from XML information sets to language-specific (abstract) data models. The approach has three parts:

  • A set of conventions for constructing UML models, using the Violet open source graphical UML diagram editor;
  • A pipeline of XSLT stylesheets to convert the XML representation of those diagrams to OWL ontologies;
  • A set of guidelines for writing XSLT stylesheets or other transformations (e.g. pipelines) to implement GRDDL-triggered mapping from language documents to data model instances expressed in RDF.

The result of implementing this approach is that an OWL ontology for a language data model and an RDF instance corresponding to an individual language document can be combined and checked for consistency. The result, if consistent, can then also be compared to (RDF expressions of) concrete data model instances from an implementation. This would enable semi-automatic conformance testing, if the language specification actually included the three parts listed above.

Throughout the paper the points under discussion are illustrated with examples taken from the XML Processing Model language, currently under development by the W3C XML Processing Model Working Group. Some things were learned about that language while carrying out the exercise reported here, which are also briefly discussed.

Connections are also made to earlier work on expressing data-binding information via schema annotations, which suggest the possibility of auto-generating the stylesheets required for part (3) above in some cases.

Acknowledgements

Dan Connolly wrote the first Violet-to-OWL stylesheet, which really inspired this entire project.