Active learning from crowd in document screening

Krivosheev, E.; Sayin Günel, B.; Bozzon, A.; Szlavik, Z.

In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique -objective-aware sampling- for querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.

Active learning from crowd in document screening / Krivosheev, E.; Sayin Günel, B.; Bozzon, A.; Szlavik, Z.. - ELETTRONICO. - 2736:(2020), pp. 19-25. (Intervento presentato al convegno 2020 Crowd Science Workshop: Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation, CSW 2020 tenutosi a Vancouver, BC, Canada (Online) nel 11 December, 2020).

Active learning from crowd in document screening

Krivosheev E.;Sayin Günel B.;Bozzon A.;Szlavik Z.

2020-01-01

Abstract

In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique -objective-aware sampling- for querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2020
			
	Titolo del volume (Proceedings title)
	
				CEUR Workshop Proceedings
			
	Luogo di edizione (Place of publication)
	
				Aachen, Germany
			
	Casa editrice (Publisher)
	
				CEUR-WS
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85097872359
			
	Tutti gli autori
	
						Krivosheev, E.; Sayin Günel, B.; Bozzon, A.; Szlavik, Z.
					
	Citazione
	
				Active learning from crowd in document screening / Krivosheev, E.; Sayin Günel, B.; Bozzon, A.; Szlavik, Z.. - ELETTRONICO. - 2736:(2020), pp. 19-25. (Intervento presentato al  convegno 2020 Crowd Science Workshop: Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation, CSW 2020 tenutosi a Vancouver, BC, Canada (Online) nel 11 December, 2020).
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
Active learning from crowd in document screening.pdf accesso aperto Descrizione: Main article Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 541 kB Formato Adobe PDF Visualizza/Apri	541 kB	Adobe PDF	Visualizza/Apri