Word sense disambiguation is a fundamental problem in natural language processing (NLP). In this problem, a large corpus of documents contains mentions to well-known (non-ambiguous) words, together with mentions to ambiguous ones. The goal is to compute a clustering of the corpus, such that documents that refer to the same meaning appear in the same cluster; subsequentially, each cluster is assigned to a different semantic meaning. In this paper, we propose a mechanism for word sense disambiguation based on distributed graph clustering that is incremental in nature and can scale to big data. A novel, heuristic vertex-centric algorithm based on the metaphor of the water cycle is used to cluster the graph. Our approach is evaluated on real datasets in both centralized and decentralized environments.

Tovel: Distributed Graph Clustering for Word Sense Disambiguation

Guerrieri, Alessio;Montresor, Alberto
2016-01-01

Abstract

Word sense disambiguation is a fundamental problem in natural language processing (NLP). In this problem, a large corpus of documents contains mentions to well-known (non-ambiguous) words, together with mentions to ambiguous ones. The goal is to compute a clustering of the corpus, such that documents that refer to the same meaning appear in the same cluster; subsequentially, each cluster is assigned to a different semantic meaning. In this paper, we propose a mechanism for word sense disambiguation based on distributed graph clustering that is incremental in nature and can scale to big data. A novel, heuristic vertex-centric algorithm based on the metaphor of the water cycle is used to cluster the graph. Our approach is evaluated on real datasets in both centralized and decentralized environments.
2016
Procedings of the 4th ICDM Workshop on Data Science and Big Data Analytics
United States
IEEE
Guerrieri, Alessio; Rahimian, Fatemeh; Girdzijauskas, Sarunas; Montresor, Alberto
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/164956
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact