Tovel: Distributed Graph Clustering for Word Sense Disambiguation

IRIS

Word sense disambiguation is a fundamental problem in natural language processing (NLP). In this problem, a large corpus of documents contains mentions to well-known (non-ambiguous) words, together with mentions to ambiguous ones. The goal is to compute a clustering of the corpus, such that documents that refer to the same meaning appear in the same cluster; subsequentially, each cluster is assigned to a different semantic meaning. In this paper, we propose a mechanism for word sense disambiguation based on distributed graph clustering that is incremental in nature and can scale to big data. A novel, heuristic vertex-centric algorithm based on the metaphor of the water cycle is used to cluster the graph. Our approach is evaluated on real datasets in both centralized and decentralized environments.

Tovel: Distributed Graph Clustering for Word Sense Disambiguation

Guerrieri, Alessio;Rahimian, Fatemeh;Girdzijauskas, Sarunas;Montresor, Alberto

2016-01-01

Abstract

Word sense disambiguation is a fundamental problem in natural language processing (NLP). In this problem, a large corpus of documents contains mentions to well-known (non-ambiguous) words, together with mentions to ambiguous ones. The goal is to compute a clustering of the corpus, such that documents that refer to the same meaning appear in the same cluster; subsequentially, each cluster is assigned to a different semantic meaning. In this paper, we propose a mechanism for word sense disambiguation based on distributed graph clustering that is incremental in nature and can scale to big data. A novel, heuristic vertex-centric algorithm based on the metaphor of the water cycle is used to cluster the graph. Our approach is evaluated on real datasets in both centralized and decentralized environments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2016
			
	Titolo del volume (Proceedings title)
	
				Procedings of the 4th ICDM Workshop on Data Science and Big Data Analytics
			
	Luogo di edizione (Place of publication)
	
				United States
			
	Casa editrice (Publisher)
	
				IEEE
			
	Tutti gli autori
	
						Guerrieri, Alessio; Rahimian, Fatemeh; Girdzijauskas, Sarunas; Montresor, Alberto
					
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/164956

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

ND

social impact