Clustering data that are graph connected

Benati, Stefano; Puerto, Justo; Rodríguez Chía, Antonio M.

doi:10.1016/j.ejor.2017.02.009

A new combinatorial model for clustering is proposed for all applications in which individual and rela- tional data are available. Individual data refer to the intrinsic features of units, they are stored in a matrix D , and are the typical input of all clustering algorithms proposed so far. Relational data refer to the ob- served links between units, representing social ties such as friendship, joint participation to social events, and so on. Relational data are stored in the graph G = (V, E) , and the data available for clustering are the triplet G = (V, E, D ) , called attributed graph. Known clustering algorithms can take advantage of the re- lational structure of G to redefine and refine the units membership. For example, uncertain membership of units to groups can be resolved using the sociological principle that ties are more likely to form be- tween similar units. The model proposed here shows how to take into account the graph information, combining the clique partitioning objective function (a known clustering methodology) with connectiv- ity as the structural constraint of the resulting clusters. The model can be formulated and solved using Integer Linear Programming and a new family of cutting planes. Moderate size problems are solved, and heuristic procedures are developed for instances in which the optimal solution can only be approximated. Finally, tests conducted on simulated data show that the clusters quality is greatly improved through this methodology.

Clustering data that are graph connected / Benati, Stefano; Puerto, Justo; Rodríguez Chía, Antonio M.. - In: EUROPEAN JOURNAL OF OPERATIONAL RESEARCH. - ISSN 0377-2217. - 261:1(2017), pp. 43-53. [10.1016/j.ejor.2017.02.009]

Clustering data that are graph connected

Benati, Stefano;Puerto, Justo;Rodríguez Chía, Antonio M.

2017-01-01

Abstract

A new combinatorial model for clustering is proposed for all applications in which individual and rela- tional data are available. Individual data refer to the intrinsic features of units, they are stored in a matrix D , and are the typical input of all clustering algorithms proposed so far. Relational data refer to the ob- served links between units, representing social ties such as friendship, joint participation to social events, and so on. Relational data are stored in the graph G = (V, E) , and the data available for clustering are the triplet G = (V, E, D ) , called attributed graph. Known clustering algorithms can take advantage of the re- lational structure of G to redefine and refine the units membership. For example, uncertain membership of units to groups can be resolved using the sociological principle that ties are more likely to form be- tween similar units. The model proposed here shows how to take into account the graph information, combining the clique partitioning objective function (a known clustering methodology) with connectiv- ity as the structural constraint of the resulting clusters. The model can be formulated and solved using Integer Linear Programming and a new family of cutting planes. Moderate size problems are solved, and heuristic procedures are developed for instances in which the optimal solution can only be approximated. Finally, tests conducted on simulated data show that the clusters quality is greatly improved through this methodology.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2017
			
	Titolo del periodico (Journal title)
	
				EUROPEAN JOURNAL OF OPERATIONAL RESEARCH
			
	Numero e parte del fascicolo (Issue number and part)
	
				1
			
	DOI
	
				https://dx.doi.org/10.1016/j.ejor.2017.02.009
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85014014245
			
	Codice WOS (WOS identifier)
	
				WOS:000400221000004
			
	Tutti gli autori
	
						Benati, Stefano; Puerto, Justo; Rodríguez Chía, Antonio M.
					
	Citazione
	
				Clustering data that are graph connected / Benati, Stefano; Puerto, Justo; Rodríguez Chía, Antonio M.. - In: EUROPEAN JOURNAL OF OPERATIONAL RESEARCH. - ISSN 0377-2217. - 261:1(2017), pp. 43-53. [10.1016/j.ejor.2017.02.009]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
Puerto-Clustering.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 541.09 kB Formato Adobe PDF Visualizza/Apri	541.09 kB	Adobe PDF	Visualizza/Apri
Clustering data... Early_version_GraphConnected.pdf accesso aperto Tipologia: Pre-print non referato (Non-refereed preprint) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 235.89 kB Formato Adobe PDF Visualizza/Apri	235.89 kB	Adobe PDF	Visualizza/Apri