The fuzzy c partition of a set of qualitative data is the problem of selecting the optimal c centroids that are the most representative of the whole population. Moreover, a set of weights w(ij) must be determined, describing the fuzzy membership function of pattern i to the cluster represented by centroid j. Both problems are formulated by a single mathematical programming problem, that is an extension of the classic p-median models often used for clustering. The new objective function is neither concave nor convex and the application requires the clustering of many thousands of data, therefore heuristic methods are to be developed to find the best fuzzy partition. In this contribution, four methods are selected, that are implementations of metaheuristics tested to solve p-median problems. Here, they are implemented and tested to find the optimal fuzzy c-partition. All heuristics implement neighborhood search with different strategies of visiting neighboring solutions: they are Random Restart method (RR), that is used in many commercial softwares and suggested in textbooks, Tabu Search (TS) that tries to find the best move to escape from a local optimum, Variable Neighborhood Search (VNS), that explores quickly the solution space, Candidate List Search (CLS), that explores only interesting starting solutions. It is found that there is not a clear best method, but their performance depends on some parameter. Tabu Search is usually accurate, but time consuming. When c is small, Variable Neighborhood Search can be a reliable alternative, while, when c is large and there are many data to cluster, Candidate List Search provides good results. We point out that the simple Random Restart method, that is sometimes used in commercial codes is of very poor quality: the implementation of one of the neighbor search algorithms leads to substantial improvements.

Categorical data fuzzy clustering: an analysis of local search heuristics

Benati, Stefano
2008-01-01

Abstract

The fuzzy c partition of a set of qualitative data is the problem of selecting the optimal c centroids that are the most representative of the whole population. Moreover, a set of weights w(ij) must be determined, describing the fuzzy membership function of pattern i to the cluster represented by centroid j. Both problems are formulated by a single mathematical programming problem, that is an extension of the classic p-median models often used for clustering. The new objective function is neither concave nor convex and the application requires the clustering of many thousands of data, therefore heuristic methods are to be developed to find the best fuzzy partition. In this contribution, four methods are selected, that are implementations of metaheuristics tested to solve p-median problems. Here, they are implemented and tested to find the optimal fuzzy c-partition. All heuristics implement neighborhood search with different strategies of visiting neighboring solutions: they are Random Restart method (RR), that is used in many commercial softwares and suggested in textbooks, Tabu Search (TS) that tries to find the best move to escape from a local optimum, Variable Neighborhood Search (VNS), that explores quickly the solution space, Candidate List Search (CLS), that explores only interesting starting solutions. It is found that there is not a clear best method, but their performance depends on some parameter. Tabu Search is usually accurate, but time consuming. When c is small, Variable Neighborhood Search can be a reliable alternative, while, when c is large and there are many data to cluster, Candidate List Search provides good results. We point out that the simple Random Restart method, that is sometimes used in commercial codes is of very poor quality: the implementation of one of the neighbor search algorithms leads to substantial improvements.
2008
Benati, Stefano
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/65975
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 18
  • ???jsp.display-item.citation.isi??? 15
social impact