Conference Programme

Saturday, 7th June 2014

09:00 Registration Opens and Coffee
Sponsor Presentation
Joe Crean (MarkLogic)
Sponsor Presentation
George Bina (oXygen)
10:15 Michael Kay (Saxonica)
10:45 Abel Braaksma (Exselt)
11:15 Morning Coffee
11:45 Matt Kohl (Oxford University Press) and Sandro Cirulli (Oxford University Press)
12:15 William Narmontas (Apt)
12:45 Lunch
14:00 George Bina (oXygen)
14:30 Elias Weingärtner (Haufe Group)
15:00 Afternoon tea
15:30 Celina Huang (Antenna House)
16:00 Steven Pemberton (CWI)
16:30 Closing of the first day
Social dinner (19:30) + Demojam

Sunday, 8th June 2014

Sponsor Presentation
Ilkwon Sim (3Ksoftware)
Sponsor Presentation
Matthias Kraus (XEditor)
10:15 Philip Fennell (MarkLogic)
10:45 Morning Coffee
11:15 Kal Ahmed (Networked Planet)
11:45 Robbert Broersma (Frameless)
12:15 Jorge Williams (Rackspace)
12:45 Lunch
14:00 Eduardo Gonçalves (Universidade Federal Fluminense, Brazil)
14:30 Lech Rzedzicki (Kode1100)
15:00 Afternoon tea
15:30 Eric Van der Vlist (Dyomedea)
16:00 Charles Foster (MarkLogician)

Session Details


Benchmarking XSLT Performance

Michael Kay (Saxonica)

This paper presents a new benchmarking framework for XSLT. The project, called XT-Speedo, is open source and we hope that it will attract a community of developers. The tangible deliverable consists of a set of test material, a set of test drivers for various XSLT processors, and tools for analyzing the test results. Underpinning these deliverables is a methodology and set of measurement objectives that influence the design and selection of material for the test suite, which are also described in this paper.


Streaming Design Patterns or:
how I learned to stop worrying and love the Stream

Abel Braaksma (Exselt)

XML and streaming, and more specifically, XSLT and streaming, is often avoided by programmers because they think that streaming is hard. They worry that when they have to rewrite their stylesheets to allow streamed processing, that the stylesheets become less maintainable, (much) harder to develop and that following the Rules on Streamability, in the absence of a good tutorial or book on the subject, is excruciatingly hard and arduous when the only reference they can refer to is the Latest Working Draft on XSLT, section 19.

This paper goes further where a previous paper by the same author left off. This previous paper explains ten rules of thumb for streaming, which will be briefly iterated over in this paper. This paper expands on that by showing design patterns of typical XSLT programming scenarios, specifically geared towards programmers new to streaming, that want to redesign a non-streaming piece of code into a streaming version to allow it to process large documents or other intrinsically streamable data.


From monolithic XML for print/web to lean XML for data
realising linked data for dictionaries

Matt Kohl, Sandrom Cirulli and Phil Gooch (all from Oxford University Press)

In order to reconcile the need for legacy data compatibility with changing business requirements, proprietary XML schema inevitably become larger and looser over time. We discuss the transition at Oxford University Press from monolithic, legacy XML models designed to capture monolingual and bilingual print dictionaries derived from multiple sources, towards a single, leaner, semantic model. The model retains the structure of a traditional dictionary, while maximising human readability and machine interpretability, thus facilitating transformation to Resource Description Framework (RDF) triples as linked data.

We describe a modular transformation process based on XProc, XSLT and Schematron that maps complex structures and multilingual metadata in the legacy data to the structures and harmonised taxonomy of the new model, making explicit information that is often implicit in the original data. Using the new model in its RDF form, we demonstrate how cross-lingual, cross-domain searches can be performed, and custom data sets can be constructed, that would be impossible or very time consuming to achieve with the original XML content stored at the individual dictionary level.


XML processing in Scala

William Narmontas (Apt) and Dino Fancellu (Felstar)

Scala is an established static- and strong-typed functional and object-oriented scalable programming language for the Java Virtual Machine.

Scala and its ecosystem are leveraged at LinkedIn, Twitter, Morgan Stanley among many companies demanding remarkable time to market, robustness, high performance and scalability.

This paper demonstrates: Scala's strong native XML support, powerful XQuery-like constructs, hybrid processing via XQuery for Scala and MarkLogic XCC for Scala, vastly increased XML processing performance, and practicality in a commercial setting ultimately increasing productivity.


XML Authoring and Review on Mobile Devices

George Bina (oXygen)

Not too long ago XML-based content was not present in a mobile-friendly form on mobile devices. Now, popular XML frameworks including DocBook and DITA allow publishing to output formats tuned to work nicely on mobile devices. Many people find XML authoring difficult on computers, let alone mobile devices - however, due to the constantly increasing number of mobile devices there is clearly a need to provide also direct access to authoring and reviewing XML content on these devices.

The reduced screen sizes and the different way of interacting with the mobile devices though touch and swipe gestures, hand writing and speech recognition force you to rethink the way people should interact with the XML content. Also, in some cases the user may not be a very technical person.

We will explore the options for providing XML authoring on mobile devices and describe our current work and the technology choices we made to create an authoring solution for mobile devices. See how we imagined XML authoring on an Android phone or on iPad!


Engineering a XML-based Content Hub for Enterprise Publishing

Elias Weingärtner and Christoph Ludwig (both from Haufe Group)

Being one of the leading publishing houses in the domains of tax, human resources and law in Germany, delivering large amounts of XML-based content to our customers is a vital part of our business at Haufe Group. We currently make use of several legacy and proprietary systems for this purpose. However, recent business needs such as the requirement for flexible transformation or complex structural queries push these systems to both conceptual and technical limits. Along with new business requirements derived from our companies business strategy, we are currently designing a new service that centrally manages our entire document corpus in XML. We term this service "Content Hub". In this paper, we sketch the architecture of this system, discuss important software architectural challenges and illustrate how we are implementing this system using standard XML technology.


A Visual Comparison Approach to Automated Regression Testing

Celina Huang (Antenna House)

Antenna House Regression Testing System (AHRTS) is an automated solution designed to perform visual regression testing of PDF output (PDF to PDF compare) from the Antenna House Formatter software by converting a set of baseline PDFs and a set of new PDFs to bitmaps, and then comparing the bitmaps pixel by pixel. Several functions of the system make use of XML and the final reports are generated using XML and XSL-FO. This paper addresses the importance of PDF to PDF comparison for regression testing and explains the visual comparison approach taken. We explain the issues of traditional methods such as manual regression testing and why the need for an automated solution. We also look at how AHRTS works and discuss the benefits we've seen since using it internally to test new releases of our own software. Given its visual-oriented capabilities, we then explore other possible uses beyond the original design intent.


Live XML Data

Steven Pemberton (CWI)

XML is often thought of in terms of documents, or data being transferred between machines, but there is an aspect of XML often overlooked, and that is as a source of live data, that can be displayed in different ways in real time, and used in interactive applications.

In this paper we talk about the use of live XML data, and give some examples of its use.


Schematron - More useful than you'd thought

Philip Fennell (MarkLogic)

The Schematron XML validation language has been around for about as long as XML and has been used extensively for validation tasks outside the gamut of what XML Schema 1.0 was defined for. The reference implementation is implemented, with great care, in XSLT and with extensibility in mind. There are a number of points in the Schemtraon compilation process that provide opportunities to extend its basic behavior and allow other modes of report outputs to be generated. This paper looks at one example of extending Schematron to create an XML to RDF Mapping Language for flexible RDF triple construction and built-in source-data validation rules.


Linked Data in a .NET World

Kal Ahmed (Networked Planet)

This paper discusses two ways in which .NET/MONO applications can query and consume RDF linked data. The first half of the paper discusses LINQ (Language Integrated Query) and its translation to SPARQL as implemented in BrightstarDB. The second half discusses the OData protocol and describes a technique for implementing an OData endpoint over a SPARQL query endpoint.

In both cases the open world model of RDF must be converted to a closed-world "view". The paper discusses the relative merits of this and the architectural approaches that can be taken to retain flexibility while still providing useful developer and end-user functionality.

Although the specific functionality discussed is related closely to .NET, there are still some general points about domain model mapping and data binding and querying RDF for those without a .NET background.


Frameless for XML - The Reactive Revolution

Robbert Broersma and Yolijn van der Kolk (both from Frameless)

What would the web look like with functional reactive templates driven by functional reactive query expressions?

Lots of recent innovative developments are significant steps towards faster and more manageable web development, but to really improve our lives by leaps and bounds we must take a step back and consider the requirements for unleashing all this power to front-end developers that aren't fluent in JavaScript.

What would happen if we throw Angular expressions, React's virtual DOM and Reactive Extensions (Rx) in a mix? What if we use a declarative syntaxes like XSLT and XPath to compile an instruction set for this engine? What if we can reason about the instructions that make up your website and automatically build minimal and optimized modules?

It's uneconomical to obtain optimal performance for most projects you're working on, there are just too many sides to it: asynchronous tasks, web workers, parallel computations, lazily loading modules, reducing file size, splitting HTML/CSS/JS into modules, combining modules again to reduce HTTP requests, minification, atomic DOM updates, only rendering what's visible, only calculating what is being rendered, only re-calculating what has changed...

But we must do better, also because performance is very much about economic inclusiveness. Smaller web pages are essential to those using internet in (remote) areas over slow 2.5G mobile networks, where wireless data charges are high and every CPU cycle counts when you're using a $25 dollar smartphone.

When we've got a reactive template solution in place we can start thinking about using some of the kilobytes we've saved and some of the CPU cycles to add ubiquitous support for unsexy inclusive technologies such as accessibility, Unicode, localization, and security.


Product Usage Schemas

Jorge Williams (Rackspace)

In this case study we describe the process of collecting, validating, and aggregating usage information in a large public cloud for the purpose of billing. We also describe the Product Usage Schema a simple xml schema language used in-house to describe, version, and validate usage messages as they are emitted by various products across our public cloud.


An XML-based Approach for Data Preprocessing of Multi-Label Classification Problems

Eduardo Gonçalves and Vanessa Braganholo (both from Universidade Federal Fluminense, Brazil)

Most of the data mining tools are only able to work with data structured either in relational tables or in text files with one record per line. However, both kinds of data representation are not well-suited to certain data mining tasks. One example of such task is multi-label classification, where the goal is to predict the states of a multi-valued target attribute. This paper discusses the use of XML as an alternative to represent datasets for multi- label classification processes, since this language offers flexible means of structuring complex information, thus potentially facilitating the major steps involved in data preprocessing. In order to discuss from a practical point of view, we describe the steps of an experience involving the preprocessing of a real text dataset.


Using abstract content model and wikis to link Semantic Web, XML, HTML, JSON and CSV
Using Semantic Media Wiki as a mechanism for storing format neutral content model

Lech Rzedzicki (Kode1100)

2013 has been hyped as the year of Big Data, 2014 is still about projects dealing with deluge of data and this trend is going to continue as organisations produce and retain exponentially growing amounts of data, outpacing their capability to utilise the data and gain insight from it.

One method of dealing with the data flood is modeling the data - applying rules to ensure it is consistent and predictable where possible, and flexible everywhere else, providing definitions, examples, alternatives and connecting related structures.

On one hand of the modeling spectrum is the traditional relational data modeling with conceptual, logical and physical models and levels of normalization. Such a strict approach is definitely working well in some environments, but not in publishing where requirements are in constant flux and are rarely well defined.

On the other hand of the spectrum is the 'NOSQL' movement where data is literally dumped to storage as is and any data validation and modelling is kept in the application layer therefore needs software developers to maintain. At the moment NOSQL developers are a scarce minority amongst established publishers and a rare and expensive resource in general.

To balance these needs and problems, at Kode1100 Ltd we have designed and developed a modeling system, which to a large extent is resilient to changes in developer fashion and taste and can be maintained by technically savvy and otherwise intelligent folks who do not have to be full time programmers.


JSON and XML: a new perspective

Eric van der Vlist (Dyomedea)

A lot has already been said about the tumultuous relationship between JSON and XML. A number of binding tools have been proposed. Extensions to the XPath Data Model (XDM) are being considered for XSLT 3.0 and XQuery 3.1 to define maps and arrays, two item types that would facilitate the import of JSON objects.

The author of this paper has already published and presented papers proposing an XML serialization for XSLT 3.0 maps and arrays, a detailed comparaison between XML and JSON data models and a proposal to extend the XDM to better bridge the gap between these data models.

None of these efforts seems to be totally satisfying to eliminate the fundamental impedance mismatch between JSON and XML suggesting that we may not have found the right angle to look at this problem.

This paper proposes a new perspective to look and the differences between JSON and XML which appears to be more constructive than the ones which had been adopted so far.

Charles Foster


Charles Foster (MarkLogician)