Multilingual semantic linguistic resource is critical for many applications in Natural Language Processing (NLP). While, building large-scale lexico-semantic resources manually from scratch is extremely expensive, which promoted the applications of automatic extraction or merger algorithms. These algorithms did benefit us in creation of large-scale resources, but introduced many kinds of errors as the side effect. For example, Chinese WordNet follows the WordNet structure and is generated via several algorithms. This automatic generation of resources introduces many kinds of errors such as wrong translation, typos and false mapping between multilingual terms. The quality of a linguistic resource influences the performance of the further applications direct- ly, which means the quality of a linguistic resource should be the higher the better. Thus, finding errors is inevitable. However, till now, there is not any efficient method to find errors from a large-scale and multi- lingual resource. Validating manually by experts could be a solution, but it is very expensive, where the obstacles come from not only the large-scale dataset, but also multilingual. Even though crowdsourcing is a method for solving large-scale and tedious task, it is still costly. By thinking in this scenario, we plan to find an effective method that can help us finding errors in low cost. We use games as our solution and adopt Universal Knowledge Core (UKC) with respect to Chinese language as our case study. UKC is a multi-layered multilingual lexico-semantic resource where a common lexical element from a different language is mapped to a formal concept. In this dissertation, we present a non-immersive game named Concept Challenge Game to find the errors that exist in English-Chinese lexico-semantic resource. In this game, people will face challenges in English synsets and have to choose the most appropriate option from the listed Chinese synsets. The players are unaware when finding errors in the lexico-semantic resource. Our evaluation shows that people are spending a significant amount of time playing and able to find differ- ent erroneous mappings. Moreover, we further extended our game to Italian version, the result is promising as well, indicating that our game has the ability to figure out errors in multilingual linguistic resources.
Concept challenge game: a game used to find errors from a multilangual linguistic resource / Zhang, Hanyu. - (2017), pp. 1-145.
Concept challenge game: a game used to find errors from a multilangual linguistic resource
Zhang, Hanyu
2017-01-01
Abstract
Multilingual semantic linguistic resource is critical for many applications in Natural Language Processing (NLP). While, building large-scale lexico-semantic resources manually from scratch is extremely expensive, which promoted the applications of automatic extraction or merger algorithms. These algorithms did benefit us in creation of large-scale resources, but introduced many kinds of errors as the side effect. For example, Chinese WordNet follows the WordNet structure and is generated via several algorithms. This automatic generation of resources introduces many kinds of errors such as wrong translation, typos and false mapping between multilingual terms. The quality of a linguistic resource influences the performance of the further applications direct- ly, which means the quality of a linguistic resource should be the higher the better. Thus, finding errors is inevitable. However, till now, there is not any efficient method to find errors from a large-scale and multi- lingual resource. Validating manually by experts could be a solution, but it is very expensive, where the obstacles come from not only the large-scale dataset, but also multilingual. Even though crowdsourcing is a method for solving large-scale and tedious task, it is still costly. By thinking in this scenario, we plan to find an effective method that can help us finding errors in low cost. We use games as our solution and adopt Universal Knowledge Core (UKC) with respect to Chinese language as our case study. UKC is a multi-layered multilingual lexico-semantic resource where a common lexical element from a different language is mapped to a formal concept. In this dissertation, we present a non-immersive game named Concept Challenge Game to find the errors that exist in English-Chinese lexico-semantic resource. In this game, people will face challenges in English synsets and have to choose the most appropriate option from the listed Chinese synsets. The players are unaware when finding errors in the lexico-semantic resource. Our evaluation shows that people are spending a significant amount of time playing and able to find differ- ent erroneous mappings. Moreover, we further extended our game to Italian version, the result is promising as well, indicating that our game has the ability to figure out errors in multilingual linguistic resources.File | Dimensione | Formato | |
---|---|---|---|
PhD-Thesis-Hanyu-v.2.1.pdf
accesso aperto
Tipologia:
Tesi di dottorato (Doctoral Thesis)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
34.43 MB
Formato
Adobe PDF
|
34.43 MB | Adobe PDF | Visualizza/Apri |
Disclaimer_Zhang.pdf
Solo gestori archivio
Tipologia:
Tesi di dottorato (Doctoral Thesis)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.11 MB
Formato
Adobe PDF
|
1.11 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione