Given a gene expression data matrix where each cell is the expression level of a gene under a certain condition, biclustering is the problem of searching for a subset of genes that co regulate and co express only under a subset of conditions. As some genes can belong to different functional categories, searching for non-redundant overlapping biclusters is an important problem in biclustering. However, most recent algorithms can only either produce disjoint biclusters or redundant biclusters with significant overlap. In other words, these algorithms do not allow users to specify the maximum overlap between the biclusters. In this paper, we propose a novel algorithm which can generate K overlapping biclusters where the maximum overlap between them is below a predefined threshold. Unlike the other approaches which often generate all biclusters at once, our algorithm produces the biclusters sequentially, where each newly generated bicluster is guaranteed to be different from the previous ones but can still overlap with them. The experiments on real datasets confirm that different meaningful overlapping biclusters are successfully discovered. Besides, under the same constraints, our algorithm returns much larger and higher-quality biclusters compared to those of the other state-of-the art algorithms.

Discovering Non-redundant Overlapping Biclusters on Gene Expression Data

Truong, Duy Tin;Battiti, Roberto;Brunato, Mauro
2013-01-01

Abstract

Given a gene expression data matrix where each cell is the expression level of a gene under a certain condition, biclustering is the problem of searching for a subset of genes that co regulate and co express only under a subset of conditions. As some genes can belong to different functional categories, searching for non-redundant overlapping biclusters is an important problem in biclustering. However, most recent algorithms can only either produce disjoint biclusters or redundant biclusters with significant overlap. In other words, these algorithms do not allow users to specify the maximum overlap between the biclusters. In this paper, we propose a novel algorithm which can generate K overlapping biclusters where the maximum overlap between them is below a predefined threshold. Unlike the other approaches which often generate all biclusters at once, our algorithm produces the biclusters sequentially, where each newly generated bicluster is guaranteed to be different from the previous ones but can still overlap with them. The experiments on real datasets confirm that different meaningful overlapping biclusters are successfully discovered. Besides, under the same constraints, our algorithm returns much larger and higher-quality biclusters compared to those of the other state-of-the art algorithms.
2013 IEEE 13th International Conference on Data Mining
Los Alamitos, CA
IEEE Computer Society CPS
Truong, Duy Tin; Battiti, Roberto; Brunato, Mauro
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/67451
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 3
social impact