Current AI-based language technologies - language models, machine translation systems, multilingual dictionaries and corpora - are known to focus on the world's 2-3% most widely spoken languages. Research efforts of the past decade have attempted to expand this coverage to 'under-resourced languages.' The goal of our paper is to bring attention to a corollary phenomenon that we call language modelling bias: multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages. We define language modelling bias as uneven per-language performance under similar test conditions. We show that bias stems not only from technology but also from ethically problematic research and development methodologies that disregard the needs of language communities. Moving towards diversity-aware alternatives, we present an initiative that aims at reducing language modelling bias within lexical resources through both tec...

Current AI-based language technologies - language models, machine translation systems, multilingual dictionaries and corpora - are known to focus on the world's 2-3% most widely spoken languages. Research efforts of the past decade have attempted to expand this coverage to 'under-resourced languages.' The goal of our paper is to bring attention to a corollary phenomenon that we call language modelling bias: multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages. We define language modelling bias as uneven per-language performance under similar test conditions. We show that bias stems not only from technology but also from ethically problematic research and development methodologies that disregard the needs of language communities. Moving towards diversity-aware alternatives, we present an initiative that aims at reducing language modelling bias within lexical resources through both technology design and methodology, based on an eye-level collaboration with local communities.

Tackling Language Modelling Bias in Support of Linguistic Diversity / Bella, G.; Helm, P.; Koch, G.; Giunchiglia, F.. - (2024), pp. 562-572. ( 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024 bra 2024) [10.1145/3630106.3658925].

Tackling Language Modelling Bias in Support of Linguistic Diversity

Bella G.;Giunchiglia F.
2024-01-01

Abstract

Current AI-based language technologies - language models, machine translation systems, multilingual dictionaries and corpora - are known to focus on the world's 2-3% most widely spoken languages. Research efforts of the past decade have attempted to expand this coverage to 'under-resourced languages.' The goal of our paper is to bring attention to a corollary phenomenon that we call language modelling bias: multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages. We define language modelling bias as uneven per-language performance under similar test conditions. We show that bias stems not only from technology but also from ethically problematic research and development methodologies that disregard the needs of language communities. Moving towards diversity-aware alternatives, we present an initiative that aims at reducing language modelling bias within lexical resources through both tec...
2024
2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES
Association for Computing Machinery, Inc
9798400704505
Bella, G.; Helm, P.; Koch, G.; Giunchiglia, F.
Tackling Language Modelling Bias in Support of Linguistic Diversity / Bella, G.; Helm, P.; Koch, G.; Giunchiglia, F.. - (2024), pp. 562-572. ( 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024 bra 2024) [10.1145/3630106.3658925].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/464121
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 6
  • OpenAlex ND
social impact