Opening of the Conference and a word from our Sponsors.
Opening of the Conference and a word from our Sponsors.
In this paper we describe an example of using client-side interactive XSLT 3.0 with Saxon-JS. We will present work on making use of this technology to improve an existing in-house License Tool application. The current tool is a web application built using the Servlex framework, using XForms in the front end. The tool generates licenses for the Saxon commercial products using server-side XSLT processing. We believe there are a number of benefits to moving parts of the tool's architecture client-side, and are interested in exploring how this can be done, and how it may initiate further developments for Saxon-JS itself.
Conventional use of XSD documents is mostly limited to validation, documentation and the generation of data bindings. The possibility of additional uses is little considered. This is probably due to the difficulty of processing XSD, caused by its arcane graph structure. An effective solution might be a generic transformation of XSD documents into a tree-structured representation, capturing the model contents in a transformation-friendly way. Such a tree-structured schema derivative is offered by location trees, a format defined in this paper and generated by an open-source tool. The intended use of location trees is an intermediate to be transformed into interesting artifacts. Using a chemical image, location trees can play the role of a catalyst, dramatically lowering the activation energy required to transform XSD into valuable substances. Apart from this capability, location trees are composed of a novel kind of model components inviting the attachment of metadata. The resulting metadata trees enable innovative tools, including source code generators. A few examples illustrate the new possibilities, tentatively summarized as XSD based tool development.
The Internet of Things is driven by many tiny low-powered processors that produce data in a variety of different formats, and produce the data in different ways, sometimes on demand (such as thermostats), sometimes by pushing (such as presence detectors). Traditionally, applications have to be a mash up of accesses to devices and formats. To use the data in a cohesive application, the data has to be collected and integrated; this allows very low demands to be put on the devices themselves.
The architecture described places a thin layer around a diverse collection of Internet of Things devices, hiding the data-format and data-access differences, unifying the actual data in a single XML repository, and updating the devices automatically as needed; this then allows a REST-style declarative interface to access and control the devices without having to worry about the variety of device-interfaces and formats.
This paper is a case study of a project to migrate several thousand articles published online in academic journals (captured as JATS XML) and associated assets (images, PDF versions, etc.) to a new platform. We present this in the spirit of inviting criticism of the approach (was there anything we could have done differently?), and also to demonstrate to providers of XML tools and services an example of the kind of challenges faced by publishers.
XML schema languages are mature and well understood tool for validation of XML content. However the main focus of schema languages is on validation of document structure and values adhering to few relative simple standard data types. Controlling order and cardinality of elements and attributes is very easy in all of DTD, W3C XML Schema and RELAX NG. Checking that element/attribute values is number, date or string of particular length is also very easy in both W3C XML Schema and RELAX NG with XSD datatypes.
TBC
This will be a session where invited Industry Leaders will share their thoughts around challenges and successes found in electronic publishing projects.
They will provide answers in an interactive session, whilst hinting at well proven best practices.
Among others, some of the following topics are likely to be discussed:
How should I organise the transformation of legacy documents for my XML project, in-house or better outsourced?
Would I be able to find a reliable partner or typesetter for manual capture of XML, what are the pitfalls?
How can I achieve quality assurance on the provided XML content, how do I know it is all there... and correct?
What are the options for my XML allergic authoring team to get the XML content right?
How should I manage the metadata of my documents?
This will be a session where invited Industry Leaders will share their thoughts around challenges and successes found in electronic publishing projects.
They will provide answers in an interactive session, whilst hinting at well proven best practices.
Among others, some of the following topics are likely to be discussed:
How should I organise the transformation of legacy documents for my XML project, in-house or better outsourced?
Would I be able to find a reliable partner or typesetter for manual capture of XML, what are the pitfalls?
How can I achieve quality assurance on the provided XML content, how do I know it is all there... and correct?
What are the options for my XML allergic authoring team to get the XML content right?
How should I manage the metadata of my documents?
DemoJam (bring out your demos)
Followed by Social Dinner
XSpec is an open source unit test and behaviour driven development framework for XSLT and XQuery. XSpec v0.5.0 was released in January 2017 and included new features such as XSLT 3 support and JUnit report for integration with continuous integration tools. The new release also fixed long standing bugs, provided feature parity between the Windows and MacOS/Linux scripts, integrated an automated test suite, and updated the documentation. XSpec v0.5.0 is currently included in the Oxygen 19.0 beta.
This paper highlights the new features available in XSpec v0.5.0 and reports the effort of the XML community to revive this open source project.
This paper explores whether it is feasible to create a knowledge model from structured content and to use that same knowledge model to aid subject-matter experts in the process of writing accurate structured content. This would effectively create a feedback loop which would result in continuous improvement of both.
The purpose of this paper is not to provide a comprehensive in-depth exploration, it rather tries to stage a frame of thought for newcomers to the subject of knowledge modelling by sketching a practical, working example that can easily be implemented using various technologies and then build upon.
DataDock (http://datadock.io/) is a new service that aims to make it easy for anyone to publish Linked Open Data. It consists of two main parts, a data conversion service that turns CSV into RDF and creates a GitHub pages site from the data; and a gateway that performs the necessary redirects to make the published data work as Linked Data.
Although a number of other projects already use GitHub and GitHub Pages as a way to manage and publish (Linked) Open Data, DataDock has a unique way of managing the raw RDF data that makes it possible to use Git commands to determine the change history of a dataset.
This paper will describe the technical implementation of the DataDock service and our approach to storing RDF data in Git. It also proposes a method for making use of our storage approach to support distributed SPARQL querying of DataDock repositories.
Traditional approaches to teaching XSLT and other development technologies are undergoing rapid change. The rise of online training platforms and peer to peer environments such as stackoverflow.com have changed the way that developers learn technologies. In the XSLT world we are extremely lucky to have some amazing people answering questions on the Mulberry mailing list and Stack Overflow. However, when a developer asks a question on Stack Overflow or uses Google to find an existing answer, the why behind any particular answer is often lost.
A recent exchange on Stack Overflow led me to wonder how much of our best practice might be urban legend and to consider how XSLT and other technologies could be taught well in this online environment.
This paper will investigate one or two of these questions and answers and consider whether ten year old questions and answers are the wisdom of the ages or myths and legends. I will consider whether answering questions online should be part of teaching or training experience or whether it is simply outsourced problem solving. Which of these approaches leads to higher quality XSLT development (and developers)?
TBC
TBC
XML and XSLT have been around for a very long time particularly if you include the previous incarnations of SGML and DSSSL in the mix as I do. XSLT is ready to publish the third version (3.0) any minute (if it has not already done so). In the lifetime of recommendations, XSLT has gone through many twists and turns since I was asked to form/chair the committee back in the 90's. The history and stories behind it are important if you keep the proper perspective. We must remember the environment from which XML/XSLT arose and the different technical underpinnings at play as the recommendations evolved. The seeds of the future are often found in the soil of the past.
The Evolution of XML Vocabulary Design
Giving thanks and closing of the conference.
Andrew's background is in desktop publishing, translation and editing. He has specialised in XML and related technologies since 2000. He designs, writes and documents DTDs and schemas for publishers and other users of document-based XML. He provides XML and workflow consultancy and has introduced digital-first workflows. He creates best-practice content models and focuses on validation and quality assurance as the basis of sound markup-based production. He is skilled in manipulating and validating XML using XSLT, XQuery, Schematron, Java and (parser-based) Python.
Andrew contributes to international standardisation as an individual expert member of IST/41, the Technical Committee of BSI (the UK member of ISO and IEC) responsible for XML and related standards.
Goto Andrew's talk
Debbie joined the Saxonica development team in 2014 following post-doctoral research in Mathematics at the University of Leeds. Debbie has worked on performance benchmarking, on which she co-authored a paper which she jointly-presented at XML London 2014, and on developing the tools for creating Saxonica's product documentation. She is currently working on the implementation of XQuery 3.1 features.
Goto Debbie's talk
Debbie is an architect and developer of XML Tag Sets (vocabularies) who designs and writes the schemas (DTD, XSD, RELAX NG) that model those vocabularies. Been working with XML, XSLT, and XPath since their inception and with SGML (XML's predecessor) since 1984. Most recently, Debbie serves as the XML-hands for (and as a member of) the NISO JATS Standing Committee, which maintains the JATS vocabularies (Journal Article Tag Suite). JATS is the ANSI/NISO successor to the NLM Journal Archiving and Interchange Tag Suite. The three ANSI/NISO Z39.96-201 tag sets (Archiving, Publishing, and Authoring) are used by publishers, archives, aggregators, and libraries worldwide for tagging journal articles. She also maintains the new NLM book vocabulary BITS (Book Interchange Tag Suite), which is used to tag STM books and book-like material.
Goto Deborah's talk
As a managing director of le-tex publishing services GmbH, Gerrit Imsieke is responsible for XML technologies and business development. Gerrit has the privilege of being able to devote almost half of his time to actual software development, particularly in the XProc and XSLT 2+ languages.
Gerrit successfully made up for his abysmal sales skills by releasing the open-source, open-standards conversion/validation framework transpect. It is built on XProc, XSLT, Relax NG, and Schematron, and it attracted many new customers that like to avoid vendor lock-in by adopting open-source, open-standard solutions.
Gerrit is currently involved in the standardization of NISO Z39.102-201x, STS: Standards Tag Suite and XProc 3.0.
Goto Gerrit's talk
Hans is a developer with a keen interest in XML technologies in general and their not so obvious potential in particular. Hans is the author of TopicTools (a lightweight framework for the development of XQuery command-line tools) as well as FOXpath ( an expression language for navigating the file system and other resource trees).
Goto Hans-Juergen's talk
Jirka is the Chairman of XML Prague as well as a freelance consultant, lecturer, writer, university teacher, open source developer and standards contributer.
He has written many books and articles about XML and Web technologies and also offers training and consulting services around XML and Linked Data technologies.
Goto Jirka's talk
Kal is the founder of Networked Planet which helps organisations of all sizes make the best use of their data, content and knowledge.
He has long experience of working on projects and products related to both document and knowledge management and helping organizations to make better use of their data and their internal knowledge capital.
Kal is the Lead developer of BrightstarDB (http://brightstardb.com/) a .NET-native RDF triple store with unique .NET data binding features.
Goto Kal's talk
Mark learned structured markup as a lexicographer on the Oxford English Dictionary, introducing automated quality assurance tests for dictionary entries and migrating the data set to XML when its legacy editorial system was replaced.
He now manages a team of Content Architects at Oxford University Press, providing tools, processes, documentation, and strategic guidance to support production and publication of OUP’s academic and professional content.
Goto Mark's talk
Nic is an independent consultant and trainer specialising in strategic digital publishing technologies and XML-driven publishing. Over the last few years he has worked for major publishers and international bodies as an architect and consultant on projects utilising digital publishing and web technologies.
Along with his consultancy and contract work, he provides training and mentoring services for companies and individuals looking to improve their technology skill base
Goto Nic's talk
O'Neil Delpratt joined Saxonica from a research project at the University of Leicester in 2010. He is a co-developer of the Saxon product, with specific responsibility for Saxon on .NET and the C/C++/PHP platform (Saxon/C).
Before joining Saxonica, he completed his post-graduate studies at the University of Leicester. His thesis title was “In-memory Representations of XML documents”, which coincided with a C++ software development of a memory efficient DOM implementation, called Succinct DOM.
O'Neil regularly publishes and presents papers at various XML conferences and is an invited expert on the W3C XQuery Working Group.
Goto O'Neil's talk
Robin is the founder and CEO of DeltaXML. He holds an Engineering Science degree from Oxford University and an MSc in Computer Science. His background includes computer aided design software and he has been addressing the challenges and opportunities associated with information change for many years.
Goto Robin's talk
Sandro currently works as Lead Language Technologist in the Dictionaries department of Oxford University Press (OUP). Since he started working at OUP in 2012 he has been involved in several projects, including Oxford Global Language Solutions, Oxford Global Languages and Oxford Dictionaries API. He has mainly been dealing with the technical sides of these projects which included data conversion, system architecture and DevOps.
Sandro is a certified Jenkins engineer, co-maintainer of XSpec and a co-organiser of DevOps Oxford Meetup.
In 2011 Sandro graduated from Oxford Brookes University with a Master's in Computer Science. His Master's Dissertation focussed on Role and Location Based Access Control Using Semantic Web Technologies and was awarded the Big Oxford Computer Company (BOCC) Computer Systems Development Postgraduate Award for the best dissertation in computing.
Goto Sandro's talk
Shani is a Content Architect at Oxford University Press, working mainly in law digital publishing. Before that she worked in OUP journals. As a Content Architect her responsibilities include writing transformations for content and metadata, QA rules in Schematron, and maintaining documentation in DITA. She also helps to maintain an XML database that holds metadata about items of law content and the relationships between them.
Shani studied English at Oxford.
Goto Shani's talk
Ms. Adler began her decades old journey with Markup in the late 70’s by developing a markup vocabulary for the Duke University course catalogs. With a clear emphasis on print and publishing, this work led her into the development of SGML as chair of the GCA GenCode Committee (merged with two other committees that formed the basis of the ANSI/ISO X3J6 committee). After SGML came the late 80’s and 90’s and DSSSL, an ISO standardized language for expressing high-end sophisticated formatting results for print and other media. From 1985 to 1992, Ms. Adler held several key positions with IBM in Boulder, Colorado, where she was instrumental in the development of standards-based authoring and document management tools. Prior to that, she was a senior manager for Boeing Computer Services in Vienna, Virginia. From there to EBT/Inso where XML and its associated standards were of primary importance. Ms. Adler started XSLT as an XML-based DSSSL with the focus on transformations. Formatting was under the purview of XSL formatting objects. During this time Ms Adler was a senior manager at IBM Research. Her teams focus on research topics related to standards, analytics, and Web Services. She retired from IBM Research in 2010 as an RSM Emerita. She has been chair of the XSLT WG at W3C since its inception.
Goto Sharon's talk
Steven does research in the architecture of computing systems, with the ultimate aim of making them more human-oriented; He co-designed ABC, the programming language that Python was based on and was one of the first handful of people on the European Internet in 1988. Steven has been involved with the web since its beginning, organising two workshops at the first web conference in 1994, and has been chair of several working groups at W3C, designing new web technologies, including HTML, CSS, XForms, RDFa, and many others.
Goto Steven's talk
Terry is the Government Digital Services Director at Williams Lea Tag / TSO
Terry leads TSO's technology team which is made up of more than 70 developers and PRINCE2 qualified project managers with capability in all the key web development tools. The team includes Semantic Web specialists who have contributed to some of the key Wemantic Web developments in government. Terry has more than 20 years experience of providing technical solutions in publishing gained through the implementation of a large number of enterprise level publishing systems for a range of public and private sector organisations.
Goto Terry's talk
Tony Graham is a Senior Architect with Antenna House, Inc.
Previously he was an independent consultant specialising in XSL, XSLT and XML. He has been working with markup since 1991, with XML since 1996 and with XSL/XSLT since 1998.
He is Chair of the Print and Page Layout Community Group at the W3C and previously an invited expert on the W3C XML Print and Page Layout Working Group (XPPL) defining the XSL-FO specification, as well as an acknowledged expert in XSLT, developer of the open source xmlroff XSL formatter, a committer to both the XSpec and Juxy XSLT testing frameworks, the author of "Unicode: A Primer", and a qualified trainer.
Goto Tony's talk
All information about the XML London conference is open and available in Linked RDF format.
SPARQL Endpoint: http://xmllondon.com/sparql
Graph Store Protocol: http://xmllondon.com/data
Thanks go to Charles Foster and William Holmes for their contributions to the XML London dataset.
If you would like to contribute to the XML London dataset, please submit a Git Pull Request to https://github.com/cfoster/xmllondon-rdf
Please contact us if you find a bug or think something could be improved.