Eduardo Corrêa Gonçalves

An XML-based Approach for Data Preprocessing of Multi-Label Classification Problems

Eduardo Gonçalves (Universidade Federal Fluminense (UFF))

Abstract

Most of the data mining tools are only able to work with data structured either in relational tables or in text files with one record per line. However, both kinds of data representation are not well-suited to certain data mining tasks. One example of such task is multi-label classification, where the goal is to predict the states of a multi-valued target attribute. This paper discusses the use of XML as an alternative to represent datasets for multi-label classification processes, since this language offers flexible means of structuring complex information, thus potentially facilitating the major steps involved in data preprocessing. In order to discuss from a practical point of view, we describe the steps of an experience involving the preprocessing of a real text dataset.

  • Download Paper
    Conference Paper
  • Download Slides
    Conference Presentation Slides
How to cite this

Eduardo Corrêa Gonçalves and Vanessa Braganholo. "An XML-based Approach for Data Preprocessing of Multi-Label Classification Problems" Presented at XML London 2014, June 7-8th, 2014. doi:10.14337/XMLLondon14.Goncalves01.

Eduardo Corrêa Gonçalves