Clickbait is a common technique aimed at attracting a reader's attention, although it can result in inaccuracies and lead to misinformation. This work explores the role of current Natural Language Processing methods to reduce its negative impact. To do so, a novel Italian dataset is generated, containing manual annotations for classification, spoiling, and neutralisation of clickbait. Besides, several experimental evaluations are performed, assessing the performance of current language models. On the one hand, we evaluate the performance in the task of clickbait detection in a multilingual setting, showing that augmenting the data with English instances largely improves overall performance. On the other hand, the generation tasks of clickbait spoiling and neutralisation are explored. The latter is a novel task, designed to increase the informativeness of a headline, thus removing the information gap. This work opens a new research avenue that has been largely uncharted in the Italian language.

To Click it or not to Click it: An Italian Dataset for Neutralising Clickbait Headlines / Russo, Daniel; Araque, Oscar; Guerini, Marco. - ELETTRONICO. - 3878:(2024). (Intervento presentato al convegno 10th Italian Conference on Computational Linguistics, CLiC-it 2024 tenutosi a Pisa, Italy nel Dec 04-06, 2024).

To Click it or not to Click it: An Italian Dataset for Neutralising Clickbait Headlines

Russo Daniel;Guerini Marco
2024-01-01

Abstract

Clickbait is a common technique aimed at attracting a reader's attention, although it can result in inaccuracies and lead to misinformation. This work explores the role of current Natural Language Processing methods to reduce its negative impact. To do so, a novel Italian dataset is generated, containing manual annotations for classification, spoiling, and neutralisation of clickbait. Besides, several experimental evaluations are performed, assessing the performance of current language models. On the one hand, we evaluate the performance in the task of clickbait detection in a multilingual setting, showing that augmenting the data with English instances largely improves overall performance. On the other hand, the generation tasks of clickbait spoiling and neutralisation are explored. The latter is a novel task, designed to increase the informativeness of a headline, thus removing the information gap. This work opens a new research avenue that has been largely uncharted in the Italian language.
2024
CEUR Workshop Proceedings
Pisa, Italy
CEUR-WS
Russo, Daniel; Araque, Oscar; Guerini, Marco
To Click it or not to Click it: An Italian Dataset for Neutralising Clickbait Headlines / Russo, Daniel; Araque, Oscar; Guerini, Marco. - ELETTRONICO. - 3878:(2024). (Intervento presentato al convegno 10th Italian Conference on Computational Linguistics, CLiC-it 2024 tenutosi a Pisa, Italy nel Dec 04-06, 2024).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/446861
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact