A new combinatorial model for clustering is proposed for all applications in which individual and rela- tional data are available. Individual data refer to the intrinsic features of units, they are stored in a matrix D , and are the typical input of all clustering algorithms proposed so far. Relational data refer to the ob- served links between units, representing social ties such as friendship, joint participation to social events, and so on. Relational data are stored in the graph G = (V, E) , and the data available for clustering are the triplet G = (V, E, D ) , called attributed graph. Known clustering algorithms can take advantage of the re- lational structure of G to redefine and refine the units membership. For example, uncertain membership of units to groups can be resolved using the sociological principle that ties are more likely to form be- tween similar units. The model proposed here shows how to take into account the graph information, combining the clique partitioning objective function (a known clustering methodology) with connectiv- ity as the structural constraint of the resulting clusters. The model can be formulated and solved using Integer Linear Programming and a new family of cutting planes. Moderate size problems are solved, and heuristic procedures are developed for instances in which the optimal solution can only be approximated. Finally, tests conducted on simulated data show that the clusters quality is greatly improved through this methodology.
Clustering data that are graph connected / Benati, Stefano; Puerto, Justo; Rodríguez Chía, Antonio M.. - In: EUROPEAN JOURNAL OF OPERATIONAL RESEARCH. - ISSN 0377-2217. - 261:1(2017), pp. 43-53. [10.1016/j.ejor.2017.02.009]
Clustering data that are graph connected
Benati, Stefano;
2017-01-01
Abstract
A new combinatorial model for clustering is proposed for all applications in which individual and rela- tional data are available. Individual data refer to the intrinsic features of units, they are stored in a matrix D , and are the typical input of all clustering algorithms proposed so far. Relational data refer to the ob- served links between units, representing social ties such as friendship, joint participation to social events, and so on. Relational data are stored in the graph G = (V, E) , and the data available for clustering are the triplet G = (V, E, D ) , called attributed graph. Known clustering algorithms can take advantage of the re- lational structure of G to redefine and refine the units membership. For example, uncertain membership of units to groups can be resolved using the sociological principle that ties are more likely to form be- tween similar units. The model proposed here shows how to take into account the graph information, combining the clique partitioning objective function (a known clustering methodology) with connectiv- ity as the structural constraint of the resulting clusters. The model can be formulated and solved using Integer Linear Programming and a new family of cutting planes. Moderate size problems are solved, and heuristic procedures are developed for instances in which the optimal solution can only be approximated. Finally, tests conducted on simulated data show that the clusters quality is greatly improved through this methodology.File | Dimensione | Formato | |
---|---|---|---|
Puerto-Clustering.pdf
Solo gestori archivio
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
541.09 kB
Formato
Adobe PDF
|
541.09 kB | Adobe PDF | Visualizza/Apri |
Clustering data... Early_version_GraphConnected.pdf
accesso aperto
Tipologia:
Pre-print non referato (Non-refereed preprint)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
235.89 kB
Formato
Adobe PDF
|
235.89 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione