This paper presents an entirely new, one-million-word an- notated corpus for a comprehensive, machine-learning-based preprocess- ing of text in Modern Standard Arabic. Contrarily to the conventional pipeline architecture, we solve the NLP tasks of word segmentation, POS tagging and named entity recognition as a single sequence labeling task. This single-component configuration results in a faster operation and is able to provide state-of-the-art precision and recall according to our evaluations. The fine-grained output tag set output by our annotator greatly simplifies downstream tasks such as lemmatization. Provided as a trained OpenNLP component, the annotator is publicly free for re- search purposes.

A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition / Freihat, Abed Alhakim Ali Kayed; Bella, Gabor; Mubarak, Hamdy; Giunchiglia, Fausto. - (2018), pp. 1-8. (Intervento presentato al convegno International Conference on Natural Language and Speech Processing ICNLSP 2018 tenutosi a Algiers, Algeria nel 25-26 aprile 2018) [10.1109/ICNLSP.2018.8374393].

A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition

Abed Alhakim Ali Kayed Freihat;Gabor Bella;Fausto Giunchiglia
2018-01-01

Abstract

This paper presents an entirely new, one-million-word an- notated corpus for a comprehensive, machine-learning-based preprocess- ing of text in Modern Standard Arabic. Contrarily to the conventional pipeline architecture, we solve the NLP tasks of word segmentation, POS tagging and named entity recognition as a single sequence labeling task. This single-component configuration results in a faster operation and is able to provide state-of-the-art precision and recall according to our evaluations. The fine-grained output tag set output by our annotator greatly simplifies downstream tasks such as lemmatization. Provided as a trained OpenNLP component, the annotator is publicly free for re- search purposes.
2018
International Conference on Natural Language and Speech Processing ICNLSP 2018
Piscataway, NJ USA
IEEE
978-1-5386-4543-7
978-1-5386-4544-4
Freihat, Abed Alhakim Ali Kayed; Bella, Gabor; Mubarak, Hamdy; Giunchiglia, Fausto
A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition / Freihat, Abed Alhakim Ali Kayed; Bella, Gabor; Mubarak, Hamdy; Giunchiglia, Fausto. - (2018), pp. 1-8. (Intervento presentato al convegno International Conference on Natural Language and Speech Processing ICNLSP 2018 tenutosi a Algiers, Algeria nel 25-26 aprile 2018) [10.1109/ICNLSP.2018.8374393].
File in questo prodotto:
File Dimensione Formato  
A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition.pdf

accesso aperto

Tipologia: Post-print referato (Refereed author’s manuscript)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 957.07 kB
Formato Adobe PDF
957.07 kB Adobe PDF Visualizza/Apri
08374393.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 256.19 kB
Formato Adobe PDF
256.19 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/200939
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 6
social impact