The advent of Web 2.0 gave birth to a new kind of application where content is generated through the collaborative contribution of many different users. This form of content generation is believed to generate data of higher quality since the 'wisdom of the crowds' makes its way into the data. However, a number of specific data quality issues appear within such collaboratively generated data. Apart from normal updates, there are cases of intentional harmful changes known as vandalism as well as naturally occurring disagreements on topics which don't have an agreed upon viewpoint, known as controversies. While much work has focused on identifying vandalism, there has been little prior work on detecting controversies, especially at a fine granularity. Knowing about controversies when processing user-generated content is essential to understand the quality of the data and the trust that should be given to them. Controversy detection is a challenging task, since in the highly dynamic context of user updates, one needs to differentiate among normal updates, vandalisms and actual controversies. We describe a novel technique that finds these controversial issues by analyzing the edits that have been performed on the data over time. We apply the developed technique on Wikipedia, the world's largest known collaboratively generated database and we show that our approach has higher precision and recall than baseline approaches as well as is capable of finding previously unknown controversies

Fine-grained controversy detection in Wikipedia / Bykau, Siarhei; Korn, Flip; Srivastava, Divesh; Velegrakis, Ioannis. - ELETTRONICO. - (2015), pp. 1573-1584. (Intervento presentato al convegno 2015 31st IEEE International Conference on Data Engineering, ICDE 2015 tenutosi a Seoul (South Korea) nel 13th-17th April 2015) [10.1109/ICDE.2015.7113426].

Fine-grained controversy detection in Wikipedia

Velegrakis, Ioannis
2015-01-01

Abstract

The advent of Web 2.0 gave birth to a new kind of application where content is generated through the collaborative contribution of many different users. This form of content generation is believed to generate data of higher quality since the 'wisdom of the crowds' makes its way into the data. However, a number of specific data quality issues appear within such collaboratively generated data. Apart from normal updates, there are cases of intentional harmful changes known as vandalism as well as naturally occurring disagreements on topics which don't have an agreed upon viewpoint, known as controversies. While much work has focused on identifying vandalism, there has been little prior work on detecting controversies, especially at a fine granularity. Knowing about controversies when processing user-generated content is essential to understand the quality of the data and the trust that should be given to them. Controversy detection is a challenging task, since in the highly dynamic context of user updates, one needs to differentiate among normal updates, vandalisms and actual controversies. We describe a novel technique that finds these controversial issues by analyzing the edits that have been performed on the data over time. We apply the developed technique on Wikipedia, the world's largest known collaboratively generated database and we show that our approach has higher precision and recall than baseline approaches as well as is capable of finding previously unknown controversies
2015
IEEE 31st International Conference on Data Engineering (ICDE),
Piscataway, NJ USA
IEEE Computer Society
9781479979639
9781479979639
Bykau, Siarhei; Korn, Flip; Srivastava, Divesh; Velegrakis, Ioannis
Fine-grained controversy detection in Wikipedia / Bykau, Siarhei; Korn, Flip; Srivastava, Divesh; Velegrakis, Ioannis. - ELETTRONICO. - (2015), pp. 1573-1584. (Intervento presentato al convegno 2015 31st IEEE International Conference on Data Engineering, ICDE 2015 tenutosi a Seoul (South Korea) nel 13th-17th April 2015) [10.1109/ICDE.2015.7113426].
File in questo prodotto:
File Dimensione Formato  
BykauKSV15.pdf

accesso aperto

Tipologia: Pre-print non referato (Non-refereed preprint)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.04 MB
Formato Adobe PDF
1.04 MB Adobe PDF Visualizza/Apri
07113426.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.11 MB
Formato Adobe PDF
1.11 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/119585
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 9
social impact