Detecting Inappropriate Comments to News

IRIS

Inappropriate comments, defined as deliberately offensive, off-topic, troll-like, or direct attacks based on religious, sexual, racial, gender, or ethnic posts, are becoming increasingly problematic in user-generated content on the internet, because they can either derail the conversation or spread out harassment. Furthermore, the computational analysis of this kind of content, posted in response to professional news-papers, is not well investigated yet. To such an extent, the most predictive linguistic and cognitive features were seldom been addressed, and inappropriateness was not investigated deeply. After collecting a new dataset of inappropriate comments, three classic machine learning models were tested over two possible representations for the data to fed in: normal and distorted. Text distortion technique, thanks to its ability to mask thematic information, enhanced classification performance resulting in the valuable ground in which extract features from. Lexicon based features showed to be the most valuable characteristics to consider. Logistic regression turned out to be the most efficient algorithm.

Detecting Inappropriate Comments to News / Bellan, Patrizio; Strapparava, Carlo. - 11298:(2018), pp. 403-414. ( 17th International Conference of the Italian Association for Artificial Intelligence (AI*IA 18) Trento, Italy November 20–23, 2018) [10.1007/978-3-030-03840-3_30].

Detecting Inappropriate Comments to News

Patrizio Bellan;Carlo Strapparava

2018-01-01

Abstract

Inappropriate comments, defined as deliberately offensive, off-topic, troll-like, or direct attacks based on religious, sexual, racial, gender, or ethnic posts, are becoming increasingly problematic in user-generated content on the internet, because they can either derail the conversation or spread out harassment. Furthermore, the computational analysis of this kind of content, posted in response to professional news-papers, is not well investigated yet. To such an extent, the most predictive linguistic and cognitive features were seldom been addressed, and inappropriateness was not investigated deeply. After collecting a new dataset of inappropriate comments, three classic machine learning models were tested over two possible representations for the data to fed in: normal and distorted. Text distortion technique, thanks to its ability to mask thematic information, enhanced classification performance resulting in the valuable ground in which extract features from. Lexicon based features showed to be the most valuable characteristics to consider. Logistic regression turned out to be the most efficient algorithm.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2018
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 17th International Conference of the Italian Association for Artificial Intelligence (AI*IA 18)
			
	Luogo di edizione (Place of publication)
	
				Switzerland
			
	Casa editrice (Publisher)
	
				Springer
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85057353426
			
	Codice WOS (WOS identifier)
	
				WOS:000590143900030
			
	Tutti gli autori
	
						Bellan, Patrizio; Strapparava, Carlo
					
	Citazione
	
				Detecting Inappropriate Comments to News / Bellan, Patrizio; Strapparava, Carlo. - 11298:(2018), pp. 403-414. ( 17th International Conference of the Italian Association for Artificial Intelligence (AI*IA 18) Trento, Italy November 20–23, 2018) [10.1007/978-3-030-03840-3_30].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/343166

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

2

2

ND

social impact