This paper presents an entirely new, one-million-word annotated corpus for a comprehensive, machine-learning-based preprocessing of text in Modern Standard Arabic. Contrary to the conventional pipeline architecture, we solve the NLP tasks of word segmentation, POS tagging and named entity recognition as a single sequence labeling task. This single-component configuration results in a faster operation and is able to provide state-of-the-art precision and recall according to our evaluations. The fine-grained output tag set output by our annotator greatly simplifies downstream tasks such as lemmatization. Provided as a trained OpenNLP component, the annotator is free for research purposes.

This paper presents an entirely new, one-million-word an- notated corpus for a comprehensive, machine-learning-based preprocess- ing of text in Modern Standard Arabic. Contrarily to the conventional pipeline architecture, we solve the NLP tasks of word segmentation, POS tagging and named entity recognition as a single sequence labeling task. This single-component configuration results in a faster operation and is able to provide state-of-the-art precision and recall according to our evaluations. The fine-grained output tag set output by our annotator greatly simplifies downstream tasks such as lemmatization. Provided as a trained OpenNLP component, the annotator is publicly free for re- search purposes.

A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition / Freihat, Abed Alhakim Ali Kayed; Bella, Gabor; Mubarak, Hamdy; Giunchiglia, Fausto. - (2018), pp. 152-159. ( 2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018 Algiers, Algeria 25-26 aprile 2018) [10.1109/ICNLSP.2018.8374393].

A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition

Abed Alhakim Ali Kayed Freihat;Gabor Bella;Fausto Giunchiglia
2018-01-01

Abstract

This paper presents an entirely new, one-million-word annotated corpus for a comprehensive, machine-learning-based preprocessing of text in Modern Standard Arabic. Contrary to the conventional pipeline architecture, we solve the NLP tasks of word segmentation, POS tagging and named entity recognition as a single sequence labeling task. This single-component configuration results in a faster operation and is able to provide state-of-the-art precision and recall according to our evaluations. The fine-grained output tag set output by our annotator greatly simplifies downstream tasks such as lemmatization. Provided as a trained OpenNLP component, the annotator is free for research purposes.
2018
International Conference on Natural Language and Speech Processing ICNLSP 2018
345 E 47TH ST, NEW YORK, NY 10017 USA
IEEE
978-1-5386-4543-7
978-1-5386-4544-4
Freihat, Abed Alhakim Ali Kayed; Bella, Gabor; Mubarak, Hamdy; Giunchiglia, Fausto
A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition / Freihat, Abed Alhakim Ali Kayed; Bella, Gabor; Mubarak, Hamdy; Giunchiglia, Fausto. - (2018), pp. 152-159. ( 2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018 Algiers, Algeria 25-26 aprile 2018) [10.1109/ICNLSP.2018.8374393].
File in questo prodotto:
File Dimensione Formato  
A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition.pdf

accesso aperto

Tipologia: Post-print referato (Refereed author’s manuscript)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 957.07 kB
Formato Adobe PDF
957.07 kB Adobe PDF Visualizza/Apri
08374393.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 256.19 kB
Formato Adobe PDF
256.19 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/200939
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 6
  • OpenAlex ND
social impact