Corpus-based and Knowledge-based Measures of Text Semantic Similarity

IRIS

This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (e.g. text classification, information retrieval) or individual words (e.g. synonymy tests). Given that a large fraction of the information available today, on the Web and elsewhere, consists of short text snippets (e.g. abstracts of scientific documents, imagine captions, product descriptions), in this paper we focus on measuring the semantic similarity of short texts. Through experiments performed on a paraphrase data set, we show that the semantic similarity method outperforms methods based on simple lexical matching, resulting in up to 13% error rate reduction with respect to the traditional vector-based similarity metric.

Corpus-based and Knowledge-based Measures of Text Semantic Similarity / Mihalcea, R., Corley, C., Strapparava, C.. - (2006), pp. 775-780. (21st conference of American Association for Artificial Intelligence (AAAI-06) Boston, Massachusetts, USA 16/07/2006 - 20/07/2006).

Corpus-based and Knowledge-based Measures of Text Semantic Similarity

R. Mihalcea;C. Corley;C. Strapparava

2006-01-01

Abstract

This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (e.g. text classification, information retrieval) or individual words (e.g. synonymy tests). Given that a large fraction of the information available today, on the Web and elsewhere, consists of short text snippets (e.g. abstracts of scientific documents, imagine captions, product descriptions), in this paper we focus on measuring the semantic similarity of short texts. Through experiments performed on a paraphrase data set, we show that the semantic similarity method outperforms methods based on simple lexical matching, resulting in up to 13% error rate reduction with respect to the traditional vector-based similarity metric.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2006
			
	Titolo del volume (Proceedings title)
	
				21st conference of American Association for Artificial Intelligence (AAAI-06)
			
	Autore/i del libro (Book author/s)
	
				-
			
	Luogo di edizione (Place of publication)
	
				USA
			
	Casa editrice (Publisher)
	
				AAAI
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-33750693384
			
	Tutti gli autori
	
						Mihalcea, R.; Corley, C.; Strapparava, C.
					
	Citazione
	
				Corpus-based and Knowledge-based Measures of Text Semantic Similarity / Mihalcea, R., Corley, C., Strapparava, C.. - (2006), pp. 775-780. (21st conference of American Association for Artificial Intelligence (AAAI-06) Boston, Massachusetts, USA 16/07/2006 - 20/07/2006).
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/343698

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

811

ND

ND

social impact