We introduce the problem of diverse dimension decomposition in transactional databases. A dimension is a set of mutually-exclusive item sets, and our problem is to find a decomposition of the item set space into dimensions, which are orthogonal to each other, and that provide high coverage of the input database. The mining framework we propose effectively represents a dimensionality-reducing transformation from the space of all items to the space of orthogonal dimensions. Our approach relies on information-theoretic concepts, and we are able to formulate the dimension-finding problem with a single objective function that simultaneously captures constraints on coverage, exclusivity and orthogonality. We describe an efficient greedy method for finding diverse dimensions from transactional databases. The experimental evaluation of the proposed approach using two real datasets, flickr and delicious, demonstrates the effectiveness of our solution. Although we are motivated by the applications in the collaborative tagging domain, we believe that the mining task we introduce in this paper is general enough to be useful in other application domains.

Diverse Dimension Decomposition for Itemset Spaces

Tsytsarau, Mikalai;Palpanas, Themistoklis
2012-01-01

Abstract

We introduce the problem of diverse dimension decomposition in transactional databases. A dimension is a set of mutually-exclusive item sets, and our problem is to find a decomposition of the item set space into dimensions, which are orthogonal to each other, and that provide high coverage of the input database. The mining framework we propose effectively represents a dimensionality-reducing transformation from the space of all items to the space of orthogonal dimensions. Our approach relies on information-theoretic concepts, and we are able to formulate the dimension-finding problem with a single objective function that simultaneously captures constraints on coverage, exclusivity and orthogonality. We describe an efficient greedy method for finding diverse dimensions from transactional databases. The experimental evaluation of the proposed approach using two real datasets, flickr and delicious, demonstrates the effectiveness of our solution. Although we are motivated by the applications in the collaborative tagging domain, we believe that the mining task we introduce in this paper is general enough to be useful in other application domains.
2012
2
Tsytsarau, Mikalai; F., Bonchi; A., Gionis; Palpanas, Themistoklis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/92144
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact