From monolithic XML for print/web to lean XML for data

Matt Kohl; Sandro Cirulli; Phil Gooch

doi:doi:10.14337/XMLLondon14.Kohl01

From monolithic XML for print/web to lean XML for data

realising linked data for dictionaries

Matt Kohl (Oxford University Press) and Sandro Cirulli (Oxford University Press)

Abstract

In order to reconcile the need for legacy data compatibility with changing business requirements, proprietary XML schemas inevitably become larger and looser over time. We discuss the transition at Oxford University Press from monolithic XML models designed to capture monolingual and bilingual print dictionaries derived from multiple sources, towards a single, leaner, semantic model. This new model reflects the lexical content units of a traditional dictionary, while maximising human readability and machine interpretability, thus facilitating transformation to Resource Description Framework (RDF) triples as linked data.

We describe a modular transformation process based on XProc, XSLT, XSpec and Schematron that maps complex structures and multilingual metadata in the legacy data to the structures and harmonised taxonomy of the new model, making explicit information that is often implicit in the original data. Using the new model in its prototype RDF form, we demonstrate how cross-lingual, cross-domain searches can be performed, and custom data-sets can be constructed, that would be impossible or very time- consuming to achieve with the original XML content stored at the individual dictionary level.

Download Paper
Download Slides

How to cite this

Matt Kohl, Sandro Cirulli and Phil Gooch. "From monolithic XML for print/web to lean XML for data" Presented at XML London 2014, June 7-8th, 2014. doi:10.14337/XMLLondon14.Kohl01.

Video

Contact Details

Address:
XML London, 103 High Street, Evesham, WR11 4DN, UK
Phone:
+44 (0) 1386 871 904
E-mail:
info@xmllondon.com
Social

XML London 2014

From monolithic XML for print/web to lean XML for data

realising linked data for dictionaries

Matt Kohl (Oxford University Press) and Sandro Cirulli (Oxford University Press)

Abstract

Download Paper

Download Slides

How to cite this

Video

Send a Message

Contact Details