Heavy-tailed Representations, Text Polarity Classification & Data Augmentation

Jalalzai, Hamid; Colombo, Pierre; Clavel, Chloé; Gaussier, Eric; Varni, Giovanna; Vignon, Emmanuel; Sabourin, Anne

The dominant approaches to text representation in natural language rely on learning embeddings on massive corpora which have convenient properties such as compositionality and distance preservation. In this paper, we develop a novel method to learn a heavy-tailed embedding with desirable regularity properties regarding the distributional tails, which allows to analyze the points far away from the distribution bulk using the framework of multivariate extreme value theory. In particular, a classifier dedicated to the tails of the proposed embedding is obtained which exhibits a scale invariance property exploited in a novel text generation method for label preserving dataset augmentation. Experiments on synthetic and real text data show the relevance of the proposed framework and confirm that this method generates meaningful sentences with controllable attribute, e.g. positive or negative sentiments.

Heavy-tailed Representations, Text Polarity Classification & Data Augmentation / Jalalzai, H., Colombo, P., Clavel, C., Gaussier, E., Varni, G., Vignon, E., Sabourin, A.. - 2020-:(2020). (34th Conference on Neural Information Processing Systems, NeurIPS 2020 virtual event December 6-12, 2020).

Heavy-tailed Representations, Text Polarity Classification & Data Augmentation

Jalalzai, Hamid;Colombo, Pierre;Clavel, Chloé;Gaussier, Eric;Varni, Giovanna;Vignon, Emmanuel;Sabourin, Anne

2020-01-01

Abstract

The dominant approaches to text representation in natural language rely on learning embeddings on massive corpora which have convenient properties such as compositionality and distance preservation. In this paper, we develop a novel method to learn a heavy-tailed embedding with desirable regularity properties regarding the distributional tails, which allows to analyze the points far away from the distribution bulk using the framework of multivariate extreme value theory. In particular, a classifier dedicated to the tails of the proposed embedding is obtained which exhibits a scale invariance property exploited in a novel text generation method for label preserving dataset augmentation. Experiments on synthetic and real text data show the relevance of the proposed framework and confirm that this method generates meaningful sentences with controllable attribute, e.g. positive or negative sentiments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2020
			
	Titolo del volume (Proceedings title)
	
				Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
			
	Luogo di edizione (Place of publication)
	
				Canada
			
	Casa editrice (Publisher)
	
				Neural Information Processing Systems Foundation, inc.
			
	ISBN
	
				9781713829546
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85103533453
			
	Tutti gli autori
	
						Jalalzai, Hamid; Colombo, Pierre; Clavel, Chloé; Gaussier, Eric; Varni, Giovanna; Vignon, Emmanuel; Sabourin, Anne
					
	Citazione
	
				Heavy-tailed Representations, Text Polarity Classification & Data Augmentation / Jalalzai, H., Colombo, P., Clavel, C., Gaussier, E., Varni, G., Vignon, E., Sabourin, A.. - 2020-:(2020). (34th Conference on Neural Information Processing Systems, NeurIPS 2020 virtual event December 6-12, 2020).
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
NeurIPS-Varni_2020.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 668.99 kB Formato Adobe PDF Visualizza/Apri	668.99 kB	Adobe PDF	Visualizza/Apri