In this work, we address the evaluation of distributional semantic models trained on smaller, domain-specific texts, particularly philosophical text. Specifically, we inspect the behaviour of models using a pretrained background space in learning. We propose a measure of consistency which can be used as an evaluation metric when no in-domain gold-standard data is available. This measure simply computes the ability of a model to learn similar embeddings from different parts of some homogeneous data. We show that in spite of being a simple evaluation, consistency actually depends on various combinations of factors, including the nature of the data itself, the model used to train the semantic space, and the frequency of the learned terms, both in the background space and in the in-domain data of interest.

Evaluating the consistency of word embeddings from small data / Bloem, J.; Fokkens, A.; Herbelot, A.. - 2019-:(2019), pp. 132-141. (Intervento presentato al convegno 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 tenutosi a Varna, Bulgaria nel 2-4 September, 2019) [10.26615/978-954-452-056-4_016].

Evaluating the consistency of word embeddings from small data

Herbelot A.
2019-01-01

Abstract

In this work, we address the evaluation of distributional semantic models trained on smaller, domain-specific texts, particularly philosophical text. Specifically, we inspect the behaviour of models using a pretrained background space in learning. We propose a measure of consistency which can be used as an evaluation metric when no in-domain gold-standard data is available. This measure simply computes the ability of a model to learn similar embeddings from different parts of some homogeneous data. We show that in spite of being a simple evaluation, consistency actually depends on various combinations of factors, including the nature of the data itself, the model used to train the semantic space, and the frequency of the learned terms, both in the background space and in the in-domain data of interest.
2019
International Conference Recent Advances in Natural Language Processing, RANLP
Shoumen
Incoma Ltd
9789544520564
Bloem, J.; Fokkens, A.; Herbelot, A.
Evaluating the consistency of word embeddings from small data / Bloem, J.; Fokkens, A.; Herbelot, A.. - 2019-:(2019), pp. 132-141. (Intervento presentato al convegno 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 tenutosi a Varna, Bulgaria nel 2-4 September, 2019) [10.26615/978-954-452-056-4_016].
File in questo prodotto:
File Dimensione Formato  
2019_evaluating_consistency.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 169.41 kB
Formato Adobe PDF
169.41 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/249663
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact