We present CogNet, a large-scale, automatically-built database of sense-tagged cognates—words of common origin and meaning across languages. CogNet is continuously evolving: its current version contains over 8 million cognate pairs over 338 languages and 35 writing systems, with new releases already in preparation. The paper presents the algorithm and input resources used for its computation, an evaluation of the result, as well as a quantitative analysis of cognate data leading to novel insights on language diversity. Furthermore, as an example on the use of large-scale cross-lingual knowledge bases for improving the quality of multilingual applications, we present a case study on the use of CogNet for bilingual lexicon induction in the framework of cross-lingual transfer learning.

A large and evolving cognate database / Batsuren, Khuyagbaatar; Bella, Gábor; Giunchiglia, Fausto. - In: LANGUAGE RESOURCES AND EVALUATION. - ISSN 1574-020X. - 2022, 56:1(2022), pp. 165-189. [10.1007/s10579-021-09544-6]

A large and evolving cognate database

Batsuren, Khuyagbaatar;Bella, Gábor;Giunchiglia, Fausto
2022

Abstract

We present CogNet, a large-scale, automatically-built database of sense-tagged cognates—words of common origin and meaning across languages. CogNet is continuously evolving: its current version contains over 8 million cognate pairs over 338 languages and 35 writing systems, with new releases already in preparation. The paper presents the algorithm and input resources used for its computation, an evaluation of the result, as well as a quantitative analysis of cognate data leading to novel insights on language diversity. Furthermore, as an example on the use of large-scale cross-lingual knowledge bases for improving the quality of multilingual applications, we present a case study on the use of CogNet for bilingual lexicon induction in the framework of cross-lingual transfer learning.
1
Batsuren, Khuyagbaatar; Bella, Gábor; Giunchiglia, Fausto
A large and evolving cognate database / Batsuren, Khuyagbaatar; Bella, Gábor; Giunchiglia, Fausto. - In: LANGUAGE RESOURCES AND EVALUATION. - ISSN 1574-020X. - 2022, 56:1(2022), pp. 165-189. [10.1007/s10579-021-09544-6]
File in questo prodotto:
File Dimensione Formato  
Batsuren2022_Article_ALargeAndEvolvingCognateDataba.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 805.34 kB
Formato Adobe PDF
805.34 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11572/313132
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact