High-quality WordNets are crucial for achieving high-quality results in NLP applications that rely on such resources. However, the wordnets of most languages suffer from serious issues of correctness and completeness with respect to the words and word meanings they define, such as incorrect lemmas, missing glosses and example sentences, or an inadequate, Western-centric representation of the morphology and the semantics of the language. Previous efforts have largely focused on increasing lexical coverage while ignoring other qualitative aspects. In this paper, we focus on the Arabic language and introduce a major revision of the Arabic WordNet that addresses multiple dimensions of lexico-semantic resource quality. As a result, we updated more than 58% of the synsets of the existing Arabic WordNet by adding missing information and correcting errors. In order to address issues of language diversity and untranslatability, we also extended the wordnet structure by new elements: phrasets and lexical gaps.

Advancing the Arabic WordNet: Elevating Content Quality / Freihat, Abed Alhakim; Khalilia, Hadi; Bella, Gábor; Giunchiglia, Fausto. - (2024), pp. 74-83. (Intervento presentato al convegno OSACT 2024 tenutosi a Torino, Italia nel 20th -25th May).

Advancing the Arabic WordNet: Elevating Content Quality

Freihat, Abed Alhakim;Khalilia, Hadi
;
Giunchiglia, Fausto
2024-01-01

Abstract

High-quality WordNets are crucial for achieving high-quality results in NLP applications that rely on such resources. However, the wordnets of most languages suffer from serious issues of correctness and completeness with respect to the words and word meanings they define, such as incorrect lemmas, missing glosses and example sentences, or an inadequate, Western-centric representation of the morphology and the semantics of the language. Previous efforts have largely focused on increasing lexical coverage while ignoring other qualitative aspects. In this paper, we focus on the Arabic language and introduce a major revision of the Arabic WordNet that addresses multiple dimensions of lexico-semantic resource quality. As a result, we updated more than 58% of the synsets of the existing Arabic WordNet by adding missing information and correcting errors. In order to address issues of language diversity and untranslatability, we also extended the wordnet structure by new elements: phrasets and lexical gaps.
2024
Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024
Torino, Italia
ELRA and ICCL
Freihat, Abed Alhakim; Khalilia, Hadi; Bella, Gábor; Giunchiglia, Fausto
Advancing the Arabic WordNet: Elevating Content Quality / Freihat, Abed Alhakim; Khalilia, Hadi; Bella, Gábor; Giunchiglia, Fausto. - (2024), pp. 74-83. (Intervento presentato al convegno OSACT 2024 tenutosi a Torino, Italia nel 20th -25th May).
File in questo prodotto:
File Dimensione Formato  
2024.osact-1.9.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 364.47 kB
Formato Adobe PDF
364.47 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/412192
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact