knoWitiary is a resource that presents a reorganized version of Wiktionary’s information in machine readable format. Wiktionary contains a plethora of information about words, including sense defini- tions, etymology, translations, derived terms and anagrams. Similar work to the one reported here goes one step further than extracting information from Wiktionary: mapping it onto WordNet – NLP community’s de facto gold standard. Lexical and relation overlap shows that Wik- tionary provides different types of information compared to WordNet, which implies that much is discarded when doing a mapping. We make a case here for making space for “pure” resources alongside mapped ones, to preserve the unique information that idiosyncratic resources such as Wiktionary provide, which may open up new avenues to explore for tasks that require varied and “unorthodox” information about words.
knoWitiary: A Machine Readable Incarnation of Wiktionary / Nastase, Viviana Antonela; Strapparava, Carlo. - In: INTERNATIONAL JOURNAL OF COMPUTATIONAL LINGUISTICS AND APPLICATIONS. - ISSN 0976-0962. - STAMPA. - 6:2(2015), pp. 61-82.
knoWitiary: A Machine Readable Incarnation of Wiktionary
Strapparava, Carlo
2015-01-01
Abstract
knoWitiary is a resource that presents a reorganized version of Wiktionary’s information in machine readable format. Wiktionary contains a plethora of information about words, including sense defini- tions, etymology, translations, derived terms and anagrams. Similar work to the one reported here goes one step further than extracting information from Wiktionary: mapping it onto WordNet – NLP community’s de facto gold standard. Lexical and relation overlap shows that Wik- tionary provides different types of information compared to WordNet, which implies that much is discarded when doing a mapping. We make a case here for making space for “pure” resources alongside mapped ones, to preserve the unique information that idiosyncratic resources such as Wiktionary provide, which may open up new avenues to explore for tasks that require varied and “unorthodox” information about words.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione