This paper presents an entirely new, one-million-word an- notated corpus for a comprehensive, machine-learning-based preprocess- ing of text in Modern Standard Arabic. Contrarily to the conventional pipeline architecture, we solve the NLP tasks of word segmentation, POS tagging and named entity recognition as a single sequence labeling task. This single-component configuration results in a faster operation and is able to provide state-of-the-art precision and recall according to our evaluations. The fine-grained output tag set output by our annotator greatly simplifies downstream tasks such as lemmatization. Provided as a trained OpenNLP component, the annotator is publicly free for re- search purposes.
A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition / Freihat, Abed Alhakim Ali Kayed; Bella, Gabor; Mubarak, Hamdy; Giunchiglia, Fausto. - (2018), pp. 1-8. (Intervento presentato al convegno International Conference on Natural Language and Speech Processing ICNLSP 2018 tenutosi a Algiers, Algeria nel 25-26 aprile 2018) [10.1109/ICNLSP.2018.8374393].
A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition
Abed Alhakim Ali Kayed Freihat;Gabor Bella;Fausto Giunchiglia
2018-01-01
Abstract
This paper presents an entirely new, one-million-word an- notated corpus for a comprehensive, machine-learning-based preprocess- ing of text in Modern Standard Arabic. Contrarily to the conventional pipeline architecture, we solve the NLP tasks of word segmentation, POS tagging and named entity recognition as a single sequence labeling task. This single-component configuration results in a faster operation and is able to provide state-of-the-art precision and recall according to our evaluations. The fine-grained output tag set output by our annotator greatly simplifies downstream tasks such as lemmatization. Provided as a trained OpenNLP component, the annotator is publicly free for re- search purposes.File | Dimensione | Formato | |
---|---|---|---|
A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition.pdf
accesso aperto
Tipologia:
Post-print referato (Refereed author’s manuscript)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
957.07 kB
Formato
Adobe PDF
|
957.07 kB | Adobe PDF | Visualizza/Apri |
08374393.pdf
Solo gestori archivio
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
256.19 kB
Formato
Adobe PDF
|
256.19 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione