Concept challenge game: a game used to find errors from a multilangual linguistic resource

Zhang, Hanyu

Multilingual semantic linguistic resource is critical for many applications in Natural Language Processing (NLP). While, building large-scale lexico-semantic resources manually from scratch is extremely expensive, which promoted the applications of automatic extraction or merger algorithms. These algorithms did benefit us in creation of large-scale resources, but introduced many kinds of errors as the side effect. For example, Chinese WordNet follows the WordNet structure and is generated via several algorithms. This automatic generation of resources introduces many kinds of errors such as wrong translation, typos and false mapping between multilingual terms. The quality of a linguistic resource influences the performance of the further applications direct- ly, which means the quality of a linguistic resource should be the higher the better. Thus, finding errors is inevitable. However, till now, there is not any efficient method to find errors from a large-scale and multi- lingual resource. Validating manually by experts could be a solution, but it is very expensive, where the obstacles come from not only the large-scale dataset, but also multilingual. Even though crowdsourcing is a method for solving large-scale and tedious task, it is still costly. By thinking in this scenario, we plan to find an effective method that can help us finding errors in low cost. We use games as our solution and adopt Universal Knowledge Core (UKC) with respect to Chinese language as our case study. UKC is a multi-layered multilingual lexico-semantic resource where a common lexical element from a different language is mapped to a formal concept. In this dissertation, we present a non-immersive game named Concept Challenge Game to find the errors that exist in English-Chinese lexico-semantic resource. In this game, people will face challenges in English synsets and have to choose the most appropriate option from the listed Chinese synsets. The players are unaware when finding errors in the lexico-semantic resource. Our evaluation shows that people are spending a significant amount of time playing and able to find differ- ent erroneous mappings. Moreover, we further extended our game to Italian version, the result is promising as well, indicating that our game has the ability to figure out errors in multilingual linguistic resources.

Concept challenge game: a game used to find errors from a multilangual linguistic resource / Zhang, Hanyu. - (2017), pp. 1-145.

Concept challenge game: a game used to find errors from a multilangual linguistic resource

Zhang, Hanyu

2017-01-01

Abstract

Multilingual semantic linguistic resource is critical for many applications in Natural Language Processing (NLP). While, building large-scale lexico-semantic resources manually from scratch is extremely expensive, which promoted the applications of automatic extraction or merger algorithms. These algorithms did benefit us in creation of large-scale resources, but introduced many kinds of errors as the side effect. For example, Chinese WordNet follows the WordNet structure and is generated via several algorithms. This automatic generation of resources introduces many kinds of errors such as wrong translation, typos and false mapping between multilingual terms. The quality of a linguistic resource influences the performance of the further applications direct- ly, which means the quality of a linguistic resource should be the higher the better. Thus, finding errors is inevitable. However, till now, there is not any efficient method to find errors from a large-scale and multi- lingual resource. Validating manually by experts could be a solution, but it is very expensive, where the obstacles come from not only the large-scale dataset, but also multilingual. Even though crowdsourcing is a method for solving large-scale and tedious task, it is still costly. By thinking in this scenario, we plan to find an effective method that can help us finding errors in low cost. We use games as our solution and adopt Universal Knowledge Core (UKC) with respect to Chinese language as our case study. UKC is a multi-layered multilingual lexico-semantic resource where a common lexical element from a different language is mapped to a formal concept. In this dissertation, we present a non-immersive game named Concept Challenge Game to find the errors that exist in English-Chinese lexico-semantic resource. In this game, people will face challenges in English synsets and have to choose the most appropriate option from the listed Chinese synsets. The players are unaware when finding errors in the lexico-semantic resource. Our evaluation shows that people are spending a significant amount of time playing and able to find differ- ent erroneous mappings. Moreover, we further extended our game to Italian version, the result is promising as well, indicating that our game has the ability to figure out errors in multilingual linguistic resources.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2017
			
	Ciclo
	
				XXVIII
			
	Anno Accademico
	
				2015-2016
			
	Dipartimento
	
				Ingegneria e scienza dell'Informaz (29/10/12-)
			
	Corso di dottorato
	
				Information and Communication Technology
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Giunchiglia, Fausto
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Lingua (Language)
	
				Inglese
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore INF/01 - Informatica
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
PhD-Thesis-Hanyu-v.2.1.pdf accesso aperto Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 34.43 MB Formato Adobe PDF Visualizza/Apri	34.43 MB	Adobe PDF	Visualizza/Apri
Disclaimer_Zhang.pdf Solo gestori archivio Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.11 MB Formato Adobe PDF Visualizza/Apri	1.11 MB	Adobe PDF	Visualizza/Apri