Nic Gibson

Publishing with XProc

Transforming documents through progressive refinement

Nic Gibson (Corbas Consulting and LexisNexis)

Abstract

Over the last few years, we, as a community, have spent a great deal of time writing code to convert Microsoft Word documents into XML. This is a common task with fairly predictable stages to it. We need to read the .Docx or WordML file and and transform the flat, formatting-rich XML in a well structured XML document.

One approach to this problem is to create a pipeline that uses a progressive refinement technique to achieve a simple sequence of transformations from one format to another. Given that this approach requires the ability to chain multiple transformations together, we decided to build a framework to enable that.

This paper explores the implementation of this kind of pipelining through XProc and examine the pipeline processing used. We discuss the use of progressive enhancement to convert Microsoft Word files to an intermediate format, considering the challenges involved in converting Word in context. We look at the features of XProc which enable this sort of processing.

  • Download Paper
    Conference Paper
  • Download Slides
    Conference Presentation Slides
How to cite this

Nic Gibson. "Publishing with XProc" Presented at XML London 2015, June 6-7th, 2015. doi:10.14337/XMLLondon15.Gibson01.

Nic Gibson
Video


Run a SPARQL query

SPARQL

Browse

About

XML London - RDF triple store

All information about the XML London conference is open and available in Linked RDF format.

SPARQL Endpoint: http://xmllondon.com/sparql
Graph Store Protocol: http://xmllondon.com/data

Data Contributions and Thanks

Thanks go to Charles Foster and William Holmes for their contributions to the XML London dataset.

If you would like to contribute to the XML London dataset, please submit a Git Pull Request to https://github.com/cfoster/xmllondon-rdf

Please contact us if you find a bug or think something could be improved.