Machine translation systems have become essential tools for cross-lingual communication, yet they systematically encode and perpetuate gender bias. When translating into grammatical gender languages such as Italian, Spanish, and German, these systems default to masculine forms for gender-ambiguous referents, reinforce stereotypical associations between gender and social roles, and fail to represent non-binary identities. Such biases cause symbolic and practical harm, shaping perceptions and potentially discriminating against individuals whose gender is misrepresented or erased. This thesis proposes gender-inclusive machine translation as a principled response to these challenges. Rather than focusing on binary gender bias correction, it establishes a comprehensive framework for translation that avoids undue gender marking when gender information is unavailable and accommodates all gender identities. The investigation spans five interconnected research questions, progressing from conceptual foundations through evaluation infrastructure to practical generation and deployment. The thesis makes contributions across multiple dimensions. At the conceptual level, it investigates gender-inclusive translation across two complementary directions: conservative approaches relying on standardized linguistic resources for gender neutralization, and innovative approaches for explicit non-binary representation. It then formally defines gender-neutral translation along with desiderata guiding its application. For evaluation, it presents three benchmarks: GeNTE for English-to-Italian gender-neutral translation, its multilingual extension mGeNTE covering German, Spanish, and Greek, and Neo-GATE for neomorpheme-based English-to-Italian translation. These resources are complemented by dedicated evaluation methods, including a classifier-based approach and an LLM-as-a-Judge framework that generalizes across languages without task-specific training. Finally, a collaboration with an Italian e-learning company grounds the research in a real-world setting, yielding technical insights on integrating gender-neutral rewriting into content production workflows and stakeholder perspectives that inform design principles for deployment, emphasizing user control, explainability, and compatibility with existing authoring tools. The outcomes of the research presented in this thesis demonstrate that gender-inclusive machine translation is socially relevant, linguistically complex, and technically feasible, but poses distinctive challenges that current systems do not fully address. The resources, methods, and insights established in this thesis provide a foundation for continued progress toward systems that serve all users equitably.

Towards Gender-Inclusive Machine Translation / Piergentili, Andrea. - (2026 Apr 20).

Towards Gender-Inclusive Machine Translation

Piergentili, Andrea
2026-04-20

Abstract

Machine translation systems have become essential tools for cross-lingual communication, yet they systematically encode and perpetuate gender bias. When translating into grammatical gender languages such as Italian, Spanish, and German, these systems default to masculine forms for gender-ambiguous referents, reinforce stereotypical associations between gender and social roles, and fail to represent non-binary identities. Such biases cause symbolic and practical harm, shaping perceptions and potentially discriminating against individuals whose gender is misrepresented or erased. This thesis proposes gender-inclusive machine translation as a principled response to these challenges. Rather than focusing on binary gender bias correction, it establishes a comprehensive framework for translation that avoids undue gender marking when gender information is unavailable and accommodates all gender identities. The investigation spans five interconnected research questions, progressing from conceptual foundations through evaluation infrastructure to practical generation and deployment. The thesis makes contributions across multiple dimensions. At the conceptual level, it investigates gender-inclusive translation across two complementary directions: conservative approaches relying on standardized linguistic resources for gender neutralization, and innovative approaches for explicit non-binary representation. It then formally defines gender-neutral translation along with desiderata guiding its application. For evaluation, it presents three benchmarks: GeNTE for English-to-Italian gender-neutral translation, its multilingual extension mGeNTE covering German, Spanish, and Greek, and Neo-GATE for neomorpheme-based English-to-Italian translation. These resources are complemented by dedicated evaluation methods, including a classifier-based approach and an LLM-as-a-Judge framework that generalizes across languages without task-specific training. Finally, a collaboration with an Italian e-learning company grounds the research in a real-world setting, yielding technical insights on integrating gender-neutral rewriting into content production workflows and stakeholder perspectives that inform design principles for deployment, emphasizing user control, explainability, and compatibility with existing authoring tools. The outcomes of the research presented in this thesis demonstrate that gender-inclusive machine translation is socially relevant, linguistically complex, and technically feasible, but poses distinctive challenges that current systems do not fully address. The resources, methods, and insights established in this thesis provide a foundation for continued progress toward systems that serve all users equitably.
20-apr-2026
XXXVIII
2024-2025
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Negri, Matteo; Bentivogli, Luisa
no
Inglese
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/482732
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact