
Over the last few years, we, as a community, have spent a great deal of time writing code to convert Microsoft Word documents into XML. This is a common task with fairly predictable stages to it. We need to read the .Docx or WordML file and and transform the flat, formatting-rich XML in a well structured XML document.
One approach to this problem is to create a pipeline that uses a progressive refinement technique to achieve a simple sequence of transformations from one format to another. Given that this approach requires the ability to chain multiple transformations together, we decided to build a framework to enable that.
This paper explores the implementation of this kind of pipelining through XProc and examine the pipeline processing used. We discuss the use of progressive enhancement to convert Microsoft Word files to an intermediate format, considering the challenges involved in converting Word in context. We look at the features of XProc which enable this sort of processing.
Nic Gibson. "Publishing with XProc"
Presented at XML London 2015, June 6-7th, 2015.
doi:10.14337/XMLLondon15.Gibson01
.
## Example SPARQL Query (Thanks to William Holmes)
## -- Find me all People that XML London knows about
## -- who are a member of the XML Guild
PREFIX org: <http://www.w3.org/ns/org#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?xml_london_id ?person_name
where {
?xml_london_id org:memberOf <http://xmlguild.org> .
?xml_london_id foaf:name ?person_name .
?xml_london_id a foaf:Person
}
All information about the XML London conference is open and available in Linked RDF format.
SPARQL Endpoint: http://xmllondon.com/sparql
Graph Store Protocol: http://xmllondon.com/data
Thanks go to Charles Foster and William Holmes for their contributions to the XML London dataset.
If you would like to contribute to the XML London dataset, please submit a Git Pull Request to https://github.com/cfoster/xmllondon-rdf
Please contact us if you find a bug or think something could be improved.