Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

IRIS

Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases, both in terms of automatic metrics and human evaluation, especially when hateful targets are not present in the training data. This work paves the way for better and more flexible counter-speech generation models, a task for which datasets are highly challenging to produce.

Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization / Bonaldi, Helena; Attanasio, Giuseppe; Nozza, Debora; Guerini, Marco. - (2023). (Intervento presentato al convegno 1st Workshop on CounterSpeech for Online Abuse (CS4OA) tenutosi a Prague, Czechia nel 11th Septemper 2023).

Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

Bonaldi, Helena;Attanasio, Giuseppe;Nozza, Debora;Guerini, Marco

2023-01-01

Abstract

Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases, both in terms of automatic metrics and human evaluation, especially when hateful targets are not present in the training data. This work paves the way for better and more flexible counter-speech generation models, a task for which datasets are highly challenging to produce.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2023
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 1st Workshop on CounterSpeech for Online Abuse
			
	Luogo di edizione (Place of publication)
	
				317 Sidney Baker Street S., Suite 400-134, Kerrville, TX 78028, USA
			
	Casa editrice (Publisher)
	
				Association for Computational Linguistics
			
	Tutti gli autori
	
						Bonaldi, Helena; Attanasio, Giuseppe; Nozza, Debora; Guerini, Marco
					
	Citazione
	
				Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization / Bonaldi, Helena; Attanasio, Giuseppe; Nozza, Debora; Guerini, Marco. - (2023). (Intervento presentato al  convegno 1st Workshop on CounterSpeech for Online Abuse (CS4OA) tenutosi a Prague, Czechia nel 11th Septemper 2023).

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/399357

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

ND

social impact