Inappropriate comments, defined as deliberately offensive, off-topic, troll-like, or direct attacks based on religious, sexual, racial, gender, or ethnic posts, are becoming increasingly problematic in user-generated content on the internet, because they can either derail the conversation or spread out harassment. Furthermore, the computational analysis of this kind of content, posted in response to professional news-papers, is not well investigated yet. To such an extent, the most predictive linguistic and cognitive features were seldom been addressed, and inappropriateness was not investigated deeply. After collecting a new dataset of inappropriate comments, three classic machine learning models were tested over two possible representations for the data to fed in: normal and distorted. Text distortion technique, thanks to its ability to mask thematic information, enhanced classification performance resulting in the valuable ground in which extract features from. Lexicon based features showed to be the most valuable characteristics to consider. Logistic regression turned out to be the most efficient algorithm.

Detecting Inappropriate Comments to News / Bellan, Patrizio; Strapparava, Carlo. - 11298:(2018), pp. 403-414. (Intervento presentato al convegno 17th International Conference of the Italian Association for Artificial Intelligence (AI*IA 18) tenutosi a Trento, Italy nel November 20–23, 2018) [10.1007/978-3-030-03840-3_30].

Detecting Inappropriate Comments to News

Carlo Strapparava
2018-01-01

Abstract

Inappropriate comments, defined as deliberately offensive, off-topic, troll-like, or direct attacks based on religious, sexual, racial, gender, or ethnic posts, are becoming increasingly problematic in user-generated content on the internet, because they can either derail the conversation or spread out harassment. Furthermore, the computational analysis of this kind of content, posted in response to professional news-papers, is not well investigated yet. To such an extent, the most predictive linguistic and cognitive features were seldom been addressed, and inappropriateness was not investigated deeply. After collecting a new dataset of inappropriate comments, three classic machine learning models were tested over two possible representations for the data to fed in: normal and distorted. Text distortion technique, thanks to its ability to mask thematic information, enhanced classification performance resulting in the valuable ground in which extract features from. Lexicon based features showed to be the most valuable characteristics to consider. Logistic regression turned out to be the most efficient algorithm.
2018
Proceedings of the 17th International Conference of the Italian Association for Artificial Intelligence (AI*IA 18)
Switzerland
Springer
Bellan, Patrizio; Strapparava, Carlo
Detecting Inappropriate Comments to News / Bellan, Patrizio; Strapparava, Carlo. - 11298:(2018), pp. 403-414. (Intervento presentato al convegno 17th International Conference of the Italian Association for Artificial Intelligence (AI*IA 18) tenutosi a Trento, Italy nel November 20–23, 2018) [10.1007/978-3-030-03840-3_30].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/343166
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact