Numerous studies have demonstrated the ability of neural language models to learn various linguistic properties without direct supervision. This work takes an initial step towards exploring the less researched topic of how neural models discover linguistic properties of words, such as gender, as well as the rules governing their usage. We propose to use an artificial corpus generated by a PCFG based on French to precisely control the gender distribution in the training data and determine under which conditions a model correctly captures gender information or, on the contrary, appears gender-biased.

Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models / Conti, Lina; Wisniewski, Guillaume. - ELETTRONICO. - (2023), pp. 10362-10371. (Intervento presentato al convegno 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 tenutosi a Singapore nel 2023) [10.18653/v1/2023.emnlp-main.641].

Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models

Lina Conti
Primo
;
2023-01-01

Abstract

Numerous studies have demonstrated the ability of neural language models to learn various linguistic properties without direct supervision. This work takes an initial step towards exploring the less researched topic of how neural models discover linguistic properties of words, such as gender, as well as the rules governing their usage. We propose to use an artificial corpus generated by a PCFG based on French to precisely control the gender distribution in the training data and determine under which conditions a model correctly captures gender information or, on the contrary, appears gender-biased.
2023
EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
Singapore
Association for Computational Linguistics (ACL)
Conti, Lina; Wisniewski, Guillaume
Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models / Conti, Lina; Wisniewski, Guillaume. - ELETTRONICO. - (2023), pp. 10362-10371. (Intervento presentato al convegno 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 tenutosi a Singapore nel 2023) [10.18653/v1/2023.emnlp-main.641].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/467677
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact