In this paper, we present a novel dataset composed of images and comments in Italian, created with teenagers in classes using a simulated scenario to raise awareness on cyberbullying phenomena. Potentially offensive comments have been collected for more than 1,000 images and manually assigned to a semantic category. Our analysis shows that the presence of human subjects, as well as the gender of the people present in the pictures trigger different types of comment, and provides novel insight into the connection between images posted on social media and offensive messages. We also compare our corpus with a similar one obtained with WhatsApp, showing that comments to images show different characteristics compared to text-only interactions.

A Multimodal Dataset of Images and Text to Study Abusive Language / Menini, Stefano; Palmero Aprosio, Alessio; Tonelli, Sara. - ELETTRONICO. - 2769:(2020). (Intervento presentato al convegno Seventh Italian Conference on Computational Linguistics, CLiC-it 2020 tenutosi a Bologna, Italy (online) nel March 1-3, 2021).

A Multimodal Dataset of Images and Text to Study Abusive Language

Stefano Menini;Alessio Palmero Aprosio;Sara Tonelli
2020-01-01

Abstract

In this paper, we present a novel dataset composed of images and comments in Italian, created with teenagers in classes using a simulated scenario to raise awareness on cyberbullying phenomena. Potentially offensive comments have been collected for more than 1,000 images and manually assigned to a semantic category. Our analysis shows that the presence of human subjects, as well as the gender of the people present in the pictures trigger different types of comment, and provides novel insight into the connection between images posted on social media and offensive messages. We also compare our corpus with a similar one obtained with WhatsApp, showing that comments to images show different characteristics compared to text-only interactions.
2020
Proceedings of the Seventh Italian Conference on Computational Linguistics,CLiC-it 2020
Aachen, Germany
CEUR-WS.org
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
Menini, Stefano; Palmero Aprosio, Alessio; Tonelli, Sara
A Multimodal Dataset of Images and Text to Study Abusive Language / Menini, Stefano; Palmero Aprosio, Alessio; Tonelli, Sara. - ELETTRONICO. - 2769:(2020). (Intervento presentato al convegno Seventh Italian Conference on Computational Linguistics, CLiC-it 2020 tenutosi a Bologna, Italy (online) nel March 1-3, 2021).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/445232
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact