XML London 2017

Saturday, June 10th

Opening of the Conference and a word from our Sponsors.

In this paper we describe an example of using client-side interactive XSLT 3.0 with Saxon-JS. We will present work on making use of this technology to improve an existing in-house License Tool application. The current tool is a web application built using the Servlex framework, using XForms in the front end. The tool generates licenses for the Saxon commercial products using server-side XSLT processing. We believe there are a number of benefits to moving parts of the tool's architecture client-side, and are interested in exploring how this can be done, and how it may initiate further developments for Saxon-JS itself.

Conventional use of XSD documents is mostly limited to validation, documentation and the generation of data bindings. The possibility of additional uses is little considered. This is probably due to the difficulty of processing XSD, caused by its arcane graph structure. An effective solution might be a generic transformation of XSD documents into a tree-structured representation, capturing the model contents in a transformation-friendly way. Such a tree-structured schema derivative is offered by location trees, a format defined in this paper and generated by an open-source tool. The intended use of location trees is an intermediate to be transformed into interesting artifacts. Using a chemical image, location trees can play the role of a catalyst, dramatically lowering the activation energy required to transform XSD into valuable substances. Apart from this capability, location trees are composed of a novel kind of model components inviting the attachment of metadata. The resulting metadata trees enable innovative tools, including source code generators. A few examples illustrate the new possibilities, tentatively summarized as XSD based tool development.

The Internet of Things is driven by many tiny low-powered processors that produce data in a variety of different formats, and produce the data in different ways, sometimes on demand (such as thermostats), sometimes by pushing (such as presence detectors). Traditionally, applications have to be a mash up of accesses to devices and formats. To use the data in a cohesive application, the data has to be collected and integrated; this allows very low demands to be put on the devices themselves.

The architecture described places a thin layer around a diverse collection of Internet of Things devices, hiding the data-format and data-access differences, unifying the actual data in a single XML repository, and updating the devices automatically as needed; this then allows a REST-style declarative interface to access and control the devices without having to worry about the variety of device-interfaces and formats.

This paper is a case study of a project to migrate several thousand articles published online in academic journals (captured as JATS XML) and associated assets (images, PDF versions, etc.) to a new platform. We present this in the spirit of inviting criticism of the approach (was there anything we could have done differently?), and also to demonstrate to providers of XML tools and services an example of the kind of challenges faced by publishers.

XML schema languages are mature and well understood tool for validation of XML content. However the main focus of schema languages is on validation of document structure and values adhering to few relative simple standard data types. Controlling order and cardinality of elements and attributes is very easy in all of DTD, W3C XML Schema and RELAX NG. Checking that element/attribute values is number, date or string of particular length is also very easy in both W3C XML Schema and RELAX NG with XSD datatypes.

TBC

This will be a session where invited Industry Leaders will share their thoughts around challenges and successes found in electronic publishing projects.

They will provide answers in an interactive session, whilst hinting at well proven best practices.

Among others, some of the following topics are likely to be discussed:

How should I organise the transformation of legacy documents for my XML project, in-house or better outsourced?

Would I be able to find a reliable partner or typesetter for manual capture of XML, what are the pitfalls?

How can I achieve quality assurance on the provided XML content, how do I know it is all there... and correct?

What are the options for my XML allergic authoring team to get the XML content right?

How should I manage the metadata of my documents?

This will be a session where invited Industry Leaders will share their thoughts around challenges and successes found in electronic publishing projects.

They will provide answers in an interactive session, whilst hinting at well proven best practices.

Among others, some of the following topics are likely to be discussed:

How should I organise the transformation of legacy documents for my XML project, in-house or better outsourced?

Would I be able to find a reliable partner or typesetter for manual capture of XML, what are the pitfalls?

How can I achieve quality assurance on the provided XML content, how do I know it is all there... and correct?

What are the options for my XML allergic authoring team to get the XML content right?

How should I manage the metadata of my documents?

DemoJam (bring out your demos)

Followed by Social Dinner

Sunday, June 11th

XSpec is an open source unit test and behaviour driven development framework for XSLT and XQuery. XSpec v0.5.0 was released in January 2017 and included new features such as XSLT 3 support and JUnit report for integration with continuous integration tools. The new release also fixed long standing bugs, provided feature parity between the Windows and MacOS/Linux scripts, integrated an automated test suite, and updated the documentation. XSpec v0.5.0 is currently included in the Oxygen 19.0 beta.

This paper highlights the new features available in XSpec v0.5.0 and reports the effort of the XML community to revive this open source project.

This paper explores whether it is feasible to create a knowledge model from structured content and to use that same knowledge model to aid subject-matter experts in the process of writing accurate structured content. This would effectively create a feedback loop which would result in continuous improvement of both.

The purpose of this paper is not to provide a comprehensive in-depth exploration, it rather tries to stage a frame of thought for newcomers to the subject of knowledge modelling by sketching a practical, working example that can easily be implemented using various technologies and then build upon.

DataDock (http://datadock.io/) is a new service that aims to make it easy for anyone to publish Linked Open Data. It consists of two main parts, a data conversion service that turns CSV into RDF and creates a GitHub pages site from the data; and a gateway that performs the necessary redirects to make the published data work as Linked Data.

Although a number of other projects already use GitHub and GitHub Pages as a way to manage and publish (Linked) Open Data, DataDock has a unique way of managing the raw RDF data that makes it possible to use Git commands to determine the change history of a dataset.

This paper will describe the technical implementation of the DataDock service and our approach to storing RDF data in Git. It also proposes a method for making use of our storage approach to support distributed SPARQL querying of DataDock repositories.

Traditional approaches to teaching XSLT and other development technologies are undergoing rapid change. The rise of online training platforms and peer to peer environments such as stackoverflow.com have changed the way that developers learn technologies. In the XSLT world we are extremely lucky to have some amazing people answering questions on the Mulberry mailing list and Stack Overflow. However, when a developer asks a question on Stack Overflow or uses Google to find an existing answer, the why behind any particular answer is often lost.

A recent exchange on Stack Overflow led me to wonder how much of our best practice might be urban legend and to consider how XSLT and other technologies could be taught well in this online environment.

This paper will investigate one or two of these questions and answers and consider whether ten year old questions and answers are the wisdom of the ages or myths and legends. I will consider whether answering questions online should be part of teaching or training experience or whether it is simply outsourced problem solving. Which of these approaches leads to higher quality XSLT development (and developers)?

TBC

XML and XSLT have been around for a very long time particularly if you include the previous incarnations of SGML and DSSSL in the mix as I do. XSLT is ready to publish the third version (3.0) any minute (if it has not already done so). In the lifetime of recommendations, XSLT has gone through many twists and turns since I was asked to form/chair the committee back in the 90's. The history and stories behind it are important if you keep the proper perspective. We must remember the environment from which XML/XSLT arose and the different technical underpinnings at play as the recommendations evolved. The seeds of the future are often found in the soil of the past.

The Evolution of XML Vocabulary Design

Giving thanks and closing of the conference.

Programme

XML London 2017 Conference

Conference Programme (2017)

Saturday, June 10th

Sunday, June 11th

Speakers & Authors

Andrew Sales

Bert Willems

Charles Foster

Debbie Lockett

Deborah Lapeyre

Gerrit Imsieke

Hans-Juergen Rennau

Jirka Kosek

Kal Ahmed

Mark Dunn

Michael Kay

Nic Gibson

O'Neil Delpratt

Robin La Fontaine

Sandro Cirulli

Shani Chachamu

Sharon Adler

Steven Pemberton

Terry Blake

Tony Graham

SPARQL

Browse

About

10:10	Distributing XSLT Processing between Client and Server
	Debbie Lockett (Saxonica) and O'Neil Delpratt (Saxonica)

10:50	Location trees enable XSD based tool development
	Hans-Juergen Rennau

11:50	An Architecture for Unified Access to the Internet of Things
	Steven Pemberton (CWI)

12:20	Migrating journals content using Ant
	Mark Dunn (Oxford University Press) and Shani Chachamu (Oxford University Press)

15:00	Expert Panel Discussion (Moderated by Andrew Sales)
	Terry Blake (Williams Lea Tag), Gerrit Imsieke (le-tex publishing services GmbH), Robin La Fontaine (DeltaXML), Deborah Lapeyre (Mulberry Technologies) and Tony Graham (Antenna House)

16:10	Expert Panel Discussion (Moderated by Ari Nordström)
	Terry Blake (Williams Lea Tag), Gerrit Imsieke (le-tex publishing services GmbH), Robin La Fontaine (DeltaXML), Deborah Lapeyre (Mulberry Technologies) and Tony Graham (Antenna House)

10:50	Bridging the gap between knowledge modelling and technical documentation
	Bert Willems (FontoXML)

12:00	DataDock - Using GitHub to Publish Linked Open Data
	Kal Ahmed (Networked Planet)

12:30	Urban Legend or Best Practice - Teaching XSLT in the era of Stack Overflow
	Nic Gibson (Corbas Consulting)

14:30	Schematron
	Andrew Sales (Andrew Sales Digital Publishing Limited)

19:30	Social Dinner + Demo Jam

15:50	Closing Keynote - Bespoke, Bewildered, and Bebothered
	Deborah Lapeyre (Mulberry Technologies)

16:30	Closing Address (5 minutes)
	Charles Foster