We present CogNet, a large-scale, automatically-built database of sense-tagged cognates—words of common origin and meaning across languages. CogNet is continuously evolving: its current version contains over 8 million cognate pairs over 338 languages and 35 writing systems, with new releases already in preparation. The paper presents the algorithm and input resources used for its computation, an evaluation of the result, as well as a quantitative analysis of cognate data leading to novel insights on language diversity. Furthermore, as an example on the use of large-scale cross-lingual knowledge bases for improving the quality of multilingual applications, we present a case study on the use of CogNet for bilingual lexicon induction in the framework of cross-lingual transfer learning.
A large and evolving cognate database / Batsuren, Khuyagbaatar; Bella, Gábor; Giunchiglia, Fausto. - In: LANGUAGE RESOURCES AND EVALUATION. - ISSN 1574-020X. - 56:1(2022), pp. 165-189. [10.1007/s10579-021-09544-6]
A large and evolving cognate database
Batsuren, Khuyagbaatar;Bella, Gábor;Giunchiglia, Fausto
2022-01-01
Abstract
We present CogNet, a large-scale, automatically-built database of sense-tagged cognates—words of common origin and meaning across languages. CogNet is continuously evolving: its current version contains over 8 million cognate pairs over 338 languages and 35 writing systems, with new releases already in preparation. The paper presents the algorithm and input resources used for its computation, an evaluation of the result, as well as a quantitative analysis of cognate data leading to novel insights on language diversity. Furthermore, as an example on the use of large-scale cross-lingual knowledge bases for improving the quality of multilingual applications, we present a case study on the use of CogNet for bilingual lexicon induction in the framework of cross-lingual transfer learning.File | Dimensione | Formato | |
---|---|---|---|
Batsuren2022_Article_ALargeAndEvolvingCognateDataba.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
805.34 kB
Formato
Adobe PDF
|
805.34 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione