This paper presents an entirely new, one-million-word an- notated corpus for a comprehensive, machine-learning-based preprocess- ing of text in Modern Standard Arabic. Contrarily to the conventional pipeline architecture, we solve the NLP tasks of word segmentation, POS tagging and named entity recognition as a single sequence labeling task. This single-component configuration results in a faster operation and is able to provide state-of-the-art precision and recall according to our evaluations. The fine-grained output tag set output by our annotator greatly simplifies downstream tasks such as lemmatization. Provided as a trained OpenNLP component, the annotator is publicly free for re- search purposes.
A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition / Freihat, Abed Alhakim Ali Kayed; Bella, Gabor; Mubarak, Hamdy; Giunchiglia, Fausto. - (2018), pp. 1-8. ((Intervento presentato al convegno International Conference on Natural Language and Speech Processing ICNLSP 2018 tenutosi a Algiers, Algeria nel 25-26 aprile 2018 [10.1109/ICNLSP.2018.8374393].
Titolo: | A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition | |
Autori: | Freihat, Abed Alhakim Ali Kayed; Bella, Gabor; Mubarak, Hamdy; Giunchiglia, Fausto | |
Autori Unitn: | ||
Titolo del volume contenente il saggio: | International Conference on Natural Language and Speech Processing ICNLSP 2018 | |
Luogo di edizione: | Piscataway, NJ USA | |
Casa editrice: | IEEE | |
Anno di pubblicazione: | 2018 | |
Codice identificativo Scopus: | 2-s2.0-85049371695 | |
Codice identificativo WOS: | WOS:000454448300029 | |
ISBN: | 978-1-5386-4543-7 978-1-5386-4544-4 | |
Handle: | http://hdl.handle.net/11572/200939 | |
Citazione: | A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition / Freihat, Abed Alhakim Ali Kayed; Bella, Gabor; Mubarak, Hamdy; Giunchiglia, Fausto. - (2018), pp. 1-8. ((Intervento presentato al convegno International Conference on Natural Language and Speech Processing ICNLSP 2018 tenutosi a Algiers, Algeria nel 25-26 aprile 2018 [10.1109/ICNLSP.2018.8374393]. | |
Appare nelle tipologie: | 04.1 Saggio in atti di convegno (Paper in Proceedings) |
File in questo prodotto:
File | Descrizione | Tipologia | Licenza | |
---|---|---|---|---|
A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition.pdf | Post-print referato (Refereed author’s manuscript) | Tutti i diritti riservati (All rights reserved) | Open Access Visualizza/Apri | |
08374393.pdf | Versione editoriale (Publisher’s layout) | Tutti i diritti riservati (All rights reserved) | Administrator |