Most of the data mining tools are only able to work with data structured either in relational tables or in text files with one record per line. However, both kinds of data representation are not well-suited to certain data mining tasks. One example of such task is multi-label classification, where the goal is to predict the states of a multi-valued target attribute. This paper discusses the use of XML as an alternative to represent datasets for multi-label classification processes, since this language offers flexible means of structuring complex information, thus potentially facilitating the major steps involved in data preprocessing. In order to discuss from a practical point of view, we describe the steps of an experience involving the preprocessing of a real text dataset.
Eduardo Corrêa Gonçalves and Vanessa Braganholo. "An XML-based Approach for Data Preprocessing of Multi-Label Classification Problems"
Presented at XML London 2014, June 7-8th, 2014.
doi:10.14337/XMLLondon14.Goncalves01
.