Debiasing Pretrained Text Encoders by Paying Attention to Paying Attention

IRIS

Natural Language Processing (NLP) models are found to exhibit discriminatory stereotypes across many social constructs, e.g. gender and race. In comparison to the progress made in reducing bias from static word embeddings, fairness in sentence-level text encoders received little consideration despite their wider applicability in contemporary NLP tasks. In this paper, we propose a debiasing method for pre-trained text encoders that both reduces social stereotypes, and inflicts next to no semantic damage. Unlike previous studies that directly manipulate the embeddings, we suggest to dive deeper into the operation of these encoders, and pay more attention to the way they pay attention to different social groups. We find that stereotypes are also encoded in the attention layer. Then, we work on model debiasing by redistributing the attention scores of a text encoder such that it forgets any preference to historically advantaged groups, and attends to all social classes with the same intensity. Our experiments confirm that reducing bias from attention effectively mitigates it from the model’s text representations.

Debiasing Pretrained Text Encoders by Paying Attention to Paying Attention / Gaci, Yacine; Benatallah, Boualem; Casati, Fabio; Benabdeslem, Khalid. - (2022), pp. 9582-9602. (Intervento presentato al convegno EMNLP tenutosi a Abu Dhabi nel December 2022) [10.18653/v1/2022.emnlp-main.651].

Debiasing Pretrained Text Encoders by Paying Attention to Paying Attention

Gaci, Yacine;Benatallah, Boualem;Casati, Fabio;Benabdeslem, Khalid

2022-01-01

Abstract

Natural Language Processing (NLP) models are found to exhibit discriminatory stereotypes across many social constructs, e.g. gender and race. In comparison to the progress made in reducing bias from static word embeddings, fairness in sentence-level text encoders received little consideration despite their wider applicability in contemporary NLP tasks. In this paper, we propose a debiasing method for pre-trained text encoders that both reduces social stereotypes, and inflicts next to no semantic damage. Unlike previous studies that directly manipulate the embeddings, we suggest to dive deeper into the operation of these encoders, and pay more attention to the way they pay attention to different social groups. We find that stereotypes are also encoded in the attention layer. Then, we work on model debiasing by redistributing the attention scores of a text encoder such that it forgets any preference to historically advantaged groups, and attends to all social classes with the same intensity. Our experiments confirm that reducing bias from attention effectively mitigates it from the model’s text representations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
			
	Luogo di edizione (Place of publication)
	
				New York, NY, United States of America
			
	Casa editrice (Publisher)
	
				Association for Computational Linguistics
			
	ISBN
	
				978-1-952148-25-5
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85149433928
			
	Tutti gli autori
	
						Gaci, Yacine; Benatallah, Boualem; Casati, Fabio; Benabdeslem, Khalid
					
	Citazione
	
				Debiasing Pretrained Text Encoders by Paying Attention to Paying Attention / Gaci, Yacine; Benatallah, Boualem; Casati, Fabio; Benabdeslem, Khalid. - (2022), pp. 9582-9602. (Intervento presentato al  convegno EMNLP tenutosi a Abu Dhabi nel December 2022) [10.18653/v1/2022.emnlp-main.651].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
2022.emnlp-main.651.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 753.91 kB Formato Adobe PDF Visualizza/Apri	753.91 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/441178

Citazioni

ND

13

ND

ND

social impact