|09:00||Registration Opens and Coffee|
Sponsor PresentationTBD (MarkLogic)
|10:10||John Lumley (jωL Research, Saxonica)|
|10:50||Abel Braaksma (Exselt)|
|11:50||Sandro Cirulli (Oxford University Press)|
|12:20||Johannes Wilm (Vivliostyle Inc.)|
|14:00||George Bina (oXygen)|
|14:30||Fabio Labella (The University of Edinburgh / NCR)|
|15:30||Nic Gibson (Corbas Consulting)|
|16:00||Eric Van der Vlist (Dyomedea)|
|16:30||Chris de Vreeze (EBPI)|
|16:55||Closing of the first day|
Social dinner (19:30) + Demojam
Sponsor PresentationGeorge Bina (oXygen)
|10:00||Andy Seaborne (Epimorphics)|
|10:40||Philip Fearon (DeltaXML)|
|11:40||Adam Retter (Evolved Binary)|
|12:10||Norman Walsh (MarkLogic)|
|13:50||Tony Graham (Antenna House, Inc.)|
|14:20||Andrew Sales (Andrew Sales Digital Publishing)|
|15:10||Steven Pemberton (CWI)|
|15:40||James Fuller (MarkLogic)|
|16:05||Charles Foster (MarkLogician)|
XSLT's push mode of processing, where templates are invoked by matching XPath-based patterns that describe conditions on nodes to which they are applicable, is one of the really powerful features of that language. It allows very precise declarative description of the cases for which a template is considered relevant, and along with a well-defined mechanism of priority, and precedence, permits specialisation and overriding of 'libraries' to encourage significant code reuse. Whilst other features of XSLT are valuable, push-mode pattern matching is almost certainly the most important.
Consequently much effort has been expended on developing XSLT-based processing libraries, for many types of XML processing, most notably in 'document engineering', such as DocBook and DITA, which use pattern-matching templates extensively. Typically a processing step might involve the use of hundreds of templates which have to be 'checked' for applicablity against XML nodes that are being processed in a push fashion. One of the challenges for the implementor of an XSLT engine is to ensure that for most common cases, this matching process is efficient.
Abel Braaksma (Exselt)
Some larger features of XSLT 3.0 and by extension XPath 3.0, like higher order functions, packages and streaming, have been covered extensively in previous papers. This paper aims to fill the gap of knowledge and shows you how several seemingly smaller changes improve your life as a programmer, how they make common idioms easier to implement or how they create a cleaner programming experience. Features covered in this paper include try/catch and structured error handling, memoization of functions, iteration, merging, text value templates, assertions, modes and enforcing mode declarations, shadow attributes, forking and how it supports multi-threading, applying templates on atomic values, maps, 2.0 backwards compatibility and processing JSON input.
After reading this paper, you should have a firm grasp of what to expect from switching from XSLT 2.0 to XSLT 3.0, if packages and streaming are not your primary concerns.
Sandro Cirulli (Oxford University Press)
At OUP we build large amounts of XML and RDF data as it were software. However, established software development techniques like continuous integration, unit testing, and automated deployment are not always applied when converting XML and RDF since these formats are treated as data and not as software.
In this paper we describe how we set up a framework based on continuous integration and automated deployment in order to perform conversions between XML formats and from XML to RDF. We discuss the benefits of this approach as well as how this framework contributed to improve both data quality and development.
The Vivliostyle project is working on a new typesetting engine for the next phase of the digital publishing era in which web, ebook and print publishing are unified. In this article we argue that such a project is needed to bring the three publishing workflows together.
Ebooks are in most cases EPUB files. The textual content of EPUBs is provided by files containing a restricted version of Hyper Text Markup Language (HTML), the same format used for web pages. The styling of both web pages and EPUBs is defined through Cascading Style Sheets (CSS). Converting content between EPUBs and web pages is therefore not that difficult.
In contrast, most print typesetting systems are using quite different formats and standards than those for ebooks and the web. Publishing the same document for print, web and ebooks is therefore difficult, especially for documents that require updating after initial publication.
The simplest way to unify the publication processes is to introduce HTML and CSS to the print publishing process. Other projects that provide print processing functionality using HTML/CSS already exist.
However, none of these solutions have been able to establish themselves as the industry standard. In the following, we will argue that all the existing solutions have fundamental shortcomings and that the Vivliostyle project is needed to effectuate a change to web technologies in the print publishing industry.
George Bina (oXygen)
XML is an ideal format for structured content but we need to accept that are also other formats that can encode information in a structured way. We can try to get everyone to use XML but that may not be always possible for various reasons. Now it is possible to dynamically unify different formats by converting all content to XML simply by pointing to that content though a "magic" URL that performs this conversion on the fly.
We experimented with creating URLs to convert from various formats to XML, including Java classes and JavaDoc files, Excel and Google Sheet spreadsheets, Markdown and HTML, CSV files, etc. A major advantage of this approach is that it works immediately with any URL-aware application and allows to extend single source publishing across formats in a transparent way.
Since RDF is primarily intended to be processed by machines, there is a need for a technology to render it into human readable (HTML) content, similarly to what XSLT does for XML. This is made hard, however, by the high syntactic variability of RDF serialisations, that allows the same graph to be expressed in many different ways.
In this paper we propose an approach, called just- in-time reflection, that allows the application of normal XSLT stylesheets directly to RDF graphs. It follows that the transformation does not depend on any particular serialisation format: stylesheets can be written in terms of RDF's abstract syntax.
Nic Gibson (Corbas Consulting)
Over the last few years, we, as a community, have spent far too much time writing code to convert Microsoft Word documents into XML. This is a common task with fairly predictable stages to it. We need to read the .Docx or WordML file (letâs just ignore the binary format) and do some things to it that lead to a well structured XML document.
One approach to this problem is to create a pipeline that uses a progressive refinement technique to achieve a simple sequence of transformations from one format to another. Given that this approach requires the ability to chain multiple transformations together, I chose to build a framework to enable that.
The initial framework used Perl and libxslt. Once XProc and particularly calabash became available I rewrote this framework using XProc. The current version has become a useful tool for marshalling XSLT driven transformation pipelines in general.
This paper will explore the pros and cons on implementing this kind of pipelining through XProc and examine the pipeline processing used. We will look at what features of XProc make this kind of development surprisingly simple.
Eric van der Vlist (Dyomedea)
Data driven development is a popular programming paradigm often implemented using reflection in object oriented programming languages. Even if this is less common, data driven development can also be implemented with functional programming languages and this paper explores the possibilities opened by high order functions in XQuery 3.0 to develop data driven applications.
Chris De Vreeze (EBPI)
XSLT, XQuery and XPath are standard XML transformation/query languages, yet in this article yaidom (with Scala) is introduced as an alternative approach to in-memory XML querying/transformation, leveraging the Scala programming language. Still, yaidom can also be used together with standard languages such as XQuery, for example when using an XML database.
The paper first introduces Scala and Scala Collections, followed by yaidom. Then XBRL is very briefly introduced, for understanding the examples. Next, using sample XBRL data, some simple yaidom queries are shown, followed by some XBRL validations in the form of non-trivial yaidom queries. It will become clear that the combination of Scala and the yaidom library can be an attractive "XML processing stack".
Andy Seaborne (Epimorphics Ltd)
Publishing data means delivering it to a wide and changing variety of data consumers. Instead of defined, agreed use by fixed applications, data is used in ways that the publisher will find hard to predict as users find ingenious ways to use and combine data. Data services don't do "9 to 5" and publishing of the data must aim for high levels of available service.
Yet the operation of data services will need to be resilient to operational needs as well as updates. By looking at some real data publishing services, we will see that while hardware failures happen, the main causes of service disruption are operational.
We will describe a new, open source, RDF database that addresses operation needs using fault tolerance systems techniques to provide a scalable, consistent, and resilient data publishing platform for RDF.
Philip Fearon (DeltaXML Ltd)
When preparing XML content for publication, even small-scale projects can involve many people at different stages in the process; the process itself will often repeat several times. It follows that an XML review and approval workflow should allow everyone to contribute to the process at various stages; this is often critical to the quality and timeliness of the end-product.
This paper explores ideas on how XML document merge features can allow contributors and reviewers to, when necessary, work concurrently on content within an XML authoring workflow. A 'proof of concept' application called XMLFlow is used as a vehicle to demonstrate some of these ideas; some detail on the design and implementation of this proof of concept is also covered here.
Adam Retter (Evolved Binary)
Various XPDLs (XPath Derived Languages) offer many high-level abstractions which should enable us to write portable code for standards compliant processors. Unfortunately the reality is that even moderately complex applications often need to call additional functions which are non-standard and typically implementation provided. These implementation provided extension functions reduce both the portability and applicability of code written using standard XPDLs. This paper examines the relevant existing body of work and proposes a novel approach to the implementation of portable extension functions for XPDLs.
Norman Walsh (MarkLogic)
XInclude 1.1 includes new features designed to allow it to more flexibly deal with practical, real world problems of XML inclusions, for example ID/IDREF fixup. It also includes broader support for new media types and the ability to transclude text documents. It's recently been republished as a second Last Call.
One could argue that the success of XProc 1.0 has been limited to some extent by complexities in the language that have become apparent now that it's being broadly deployed. The XProc Working Group is acutely aware of its shortcomings and plans to address the most conspicuous usability issues in XProc 2.0, due out "real soon now".
Norm will cover these standards developments, shamelessly troll for implementors, and answer any questions he can.
Tony Graham (Antenna House, Inc.)
XSL-FO defies conventional validation, so much so that it hasn't been done successfully before now. This paper describes a combination of hand-written and auto-generated Relax NG plus hand-written and auto-generated Schematron that can validate XSL-FO markup.
Andrew Sales (Andrew Sales Digital Publishing)
As traditional print-based publishing has made the transition into the digital age, a convention has developed in some quarters of capturing or even typesetting content using word-processing applications.
These can present a convenient route to publication in the many instances where content derives (in the form of author manuscript) from the same word-processing package. It is also a relatively cheap and efficient one, demanding the now basic and widespread skills of styling a document to achieve the desired appearance.
As a result, typesetting workflows consuming these documents still exist, template-based workflows designed to capture structured data are still in place, and for some publishers large quantities of legacy data persist in word-processing formats only and require migration to XML to meet modern production demands.
During the long period (for some) of moving to a digital-first workflow, with publication of a single source of structured data in various renditions, it has become apparent to such publishers that the quality of their content no longer only resides in the appearance of the rendered product, but also in the quality of the data capture itself. The quality question has shifted from "Does my product look right?" to "Is my source markup sufficiently rich to service the outputs I wish to produce?" When generating XML markup from a word-processing source, the inevitable corollary is whether the document has been styled appropriately to drive good-quality data capture.
Steven Pemberton (CWI)
The internet of things is predicated on tiny, cheap, lower power computers being embedded in devices everywhere. However such tiny devices by definition have very little memory and computing power available to support user interfaces or extended servers, and so the user interface needs to be distributed over the network.
This paper describes techniques using standard technologies based on XML for creating remote user-interfaces for the Internet of Things.
Jim Fuller (MarkLogic)
XProc is a powerful language providing a facade over an array of technologies which can make managing that 'surface area' difficult. XProc v1.0 also presents difficulties to the new user as it has a learning curve which forces the developer to learn many concepts before they can be productive in the language.
I will present depify, a package manager specifically designed for XProc, that helps new and experienced users leverage reuse of available step library packages. Depify provides a mechanism for step library authors to distribute their libraries with minimal 'ceremony' using the tools (github) they use today.
I will do a deep dive into the architecture of the components that depify is comprised of as well as provide several examples of installing, authoring and interacting with dependencies with depify.
All information about the XML London conference is open and available in Linked RDF format.
Thanks go to Charles Foster and William Holmes for their contributions to the XML London dataset.
If you would like to contribute to the XML London dataset, please submit a Git Pull Request to https://github.com/cfoster/xmllondon-rdf
Please contact us if you find a bug or think something could be improved.