Evaluating the consistency of word embeddings from small data

Bloem, J.; Fokkens, A.; Herbelot, A.

doi:10.26615/978-954-452-056-4_016

In this work, we address the evaluation of distributional semantic models trained on smaller, domain-specific texts, particularly philosophical text. Specifically, we inspect the behaviour of models using a pretrained background space in learning. We propose a measure of consistency which can be used as an evaluation metric when no in-domain gold-standard data is available. This measure simply computes the ability of a model to learn similar embeddings from different parts of some homogeneous data. We show that in spite of being a simple evaluation, consistency actually depends on various combinations of factors, including the nature of the data itself, the model used to train the semantic space, and the frequency of the learned terms, both in the background space and in the in-domain data of interest.

Evaluating the consistency of word embeddings from small data / Bloem, J., Fokkens, A., Herbelot, A.. - 2019-:(2019), pp. 132-141. (12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 Varna, Bulgaria 2-4 September, 2019) [10.26615/978-954-452-056-4_016].

Evaluating the consistency of word embeddings from small data

Bloem J.;Fokkens A.;Herbelot A.

2019-01-01

Abstract

In this work, we address the evaluation of distributional semantic models trained on smaller, domain-specific texts, particularly philosophical text. Specifically, we inspect the behaviour of models using a pretrained background space in learning. We propose a measure of consistency which can be used as an evaluation metric when no in-domain gold-standard data is available. This measure simply computes the ability of a model to learn similar embeddings from different parts of some homogeneous data. We show that in spite of being a simple evaluation, consistency actually depends on various combinations of factors, including the nature of the data itself, the model used to train the semantic space, and the frequency of the learned terms, both in the background space and in the in-domain data of interest.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2019
			
	Titolo del volume (Proceedings title)
	
				International Conference Recent Advances in Natural Language Processing, RANLP
			
	Luogo di edizione (Place of publication)
	
				Shoumen
			
	Casa editrice (Publisher)
	
				Incoma Ltd
			
	ISBN
	
				9789544520564
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85076466118
			
	Codice WOS (WOS identifier)
	
				WOS:001680920900016
			
	Tutti gli autori
	
						Bloem, J.; Fokkens, A.; Herbelot, A.
					
	Citazione
	
				Evaluating the consistency of word embeddings from small data / Bloem, J., Fokkens, A., Herbelot, A.. - 2019-:(2019), pp. 132-141. (12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 Varna, Bulgaria 2-4 September, 2019) [10.26615/978-954-452-056-4_016].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
2019_evaluating_consistency.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 169.41 kB Formato Adobe PDF Visualizza/Apri	169.41 kB	Adobe PDF	Visualizza/Apri