A new combinatorial model for clustering is proposed for all applications in which individual and rela- tional data are available. Individual data refer to the intrinsic features of units, they are stored in a matrix D , and are the typical input of all clustering algorithms proposed so far. Relational data refer to the ob- served links between units, representing social ties such as friendship, joint participation to social events, and so on. Relational data are stored in the graph G = (V, E) , and the data available for clustering are the triplet G = (V, E, D ) , called attributed graph. Known clustering algorithms can take advantage of the re- lational structure of G to redefine and refine the units membership. For example, uncertain membership of units to groups can be resolved using the sociological principle that ties are more likely to form be- tween similar units. The model proposed here shows how to take into account the graph information, combining the clique partitioning objective function (a known clustering methodology) with connectiv- ity as the structural constraint of the resulting clusters. The model can be formulated and solved using Integer Linear Programming and a new family of cutting planes. Moderate size problems are solved, and heuristic procedures are developed for instances in which the optimal solution can only be approximated. Finally, tests conducted on simulated data show that the clusters quality is greatly improved through this methodology.

Clustering data that are graph connected / Benati, Stefano; Puerto, Justo; Rodríguez Chía, Antonio M.. - In: EUROPEAN JOURNAL OF OPERATIONAL RESEARCH. - ISSN 0377-2217. - 261:1(2017), pp. 43-53. [10.1016/j.ejor.2017.02.009]

Clustering data that are graph connected

Benati, Stefano;
2017-01-01

Abstract

A new combinatorial model for clustering is proposed for all applications in which individual and rela- tional data are available. Individual data refer to the intrinsic features of units, they are stored in a matrix D , and are the typical input of all clustering algorithms proposed so far. Relational data refer to the ob- served links between units, representing social ties such as friendship, joint participation to social events, and so on. Relational data are stored in the graph G = (V, E) , and the data available for clustering are the triplet G = (V, E, D ) , called attributed graph. Known clustering algorithms can take advantage of the re- lational structure of G to redefine and refine the units membership. For example, uncertain membership of units to groups can be resolved using the sociological principle that ties are more likely to form be- tween similar units. The model proposed here shows how to take into account the graph information, combining the clique partitioning objective function (a known clustering methodology) with connectiv- ity as the structural constraint of the resulting clusters. The model can be formulated and solved using Integer Linear Programming and a new family of cutting planes. Moderate size problems are solved, and heuristic procedures are developed for instances in which the optimal solution can only be approximated. Finally, tests conducted on simulated data show that the clusters quality is greatly improved through this methodology.
2017
1
Benati, Stefano; Puerto, Justo; Rodríguez Chía, Antonio M.
Clustering data that are graph connected / Benati, Stefano; Puerto, Justo; Rodríguez Chía, Antonio M.. - In: EUROPEAN JOURNAL OF OPERATIONAL RESEARCH. - ISSN 0377-2217. - 261:1(2017), pp. 43-53. [10.1016/j.ejor.2017.02.009]
File in questo prodotto:
File Dimensione Formato  
Puerto-Clustering.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 541.09 kB
Formato Adobe PDF
541.09 kB Adobe PDF   Visualizza/Apri
Clustering data... Early_version_GraphConnected.pdf

accesso aperto

Tipologia: Pre-print non referato (Non-refereed preprint)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 235.89 kB
Formato Adobe PDF
235.89 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/172987
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 20
social impact