High-risk learning: acquiring new word vectors from tiny data

Herbelot, Aurelie; Baroni, Marco

doi:10.18653/v1/D17-1030

Distributional semantics models are known to struggle with small data. It is generally accepted that in order to learn ‘a good vector’ for a word, a model must have sufficient examples of its usage. This contradicts the fact that humans can guess the meaning of a word from a few occurrences only. In this paper, we show that a neural language model such as Word2Vec only necessitates minor modifications to its standard architecture to learn new terms from tiny data, using background knowledge from a previously learnt semantic space. We test our model on word definitions and on a nonce task involving 2-6 sentences’ worth of context, showing a large increase in performance over state-of-the-art models on the definitional task.

High-risk learning: acquiring new word vectors from tiny data / Herbelot, Aurelie; Baroni, Marco. - (2017), pp. 304-309. (Intervento presentato al convegno EMNLP tenutosi a Copenhagen:Denmark nel 2017) [10.18653/v1/D17-1030].

High-risk learning: acquiring new word vectors from tiny data

Herbelot, Aurelie;Baroni, Marco

2017-01-01

Abstract

Distributional semantics models are known to struggle with small data. It is generally accepted that in order to learn ‘a good vector’ for a word, a model must have sufficient examples of its usage. This contradicts the fact that humans can guess the meaning of a word from a few occurrences only. In this paper, we show that a neural language model such as Word2Vec only necessitates minor modifications to its standard architecture to learn new terms from tiny data, using background knowledge from a previously learnt semantic space. We test our model on word definitions and on a nonce task involving 2-6 sentences’ worth of context, showing a large increase in performance over state-of-the-art models on the definitional task.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2017
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP2017)
			
	Autore/i del libro (Book author/s)
	
				Herbelot, Aurelie
			
	Luogo di edizione (Place of publication)
	
				Copenhagen:Denmark
			
	Casa editrice (Publisher)
	
				EastStroudsburg PA: ACL
			
	ISBN
	
				978-1-945626-83-8
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85073146688
			
	Tutti gli autori
	
						Herbelot, Aurelie; Baroni, Marco
					
	Citazione
	
				High-risk learning: acquiring new word vectors from tiny data / Herbelot, Aurelie; Baroni, Marco. - (2017), pp. 304-309. (Intervento presentato al  convegno EMNLP tenutosi a Copenhagen:Denmark nel 2017) [10.18653/v1/D17-1030].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
D17-1030.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 128.65 kB Formato Adobe PDF Visualizza/Apri	128.65 kB	Adobe PDF	Visualizza/Apri