We present the CIC-FBK system, which took part in the Native Language Identification (NLI) Shared Task 2017. Our approach combines features commonly used in previous NLI research, i.e., word n-grams, lemma n-grams, part-of-speech n-grams, and function words, with recently introduced character n-grams from misspelled words, and features that are novel in this task, such as typed character n-grams, and syntactic n-grams of words and of syntactic relation tags. We use log-entropy weighting scheme and perform classification using the Support Vector Machines (SVM) algorithm. Our system achieved 0.8808 macro-averaged F1-score and shared the 1st rank in the NLI Shared Task 2017 scoring

CIC-FBK Approach to Native Language Identification / Markov, Ilia; Chen, Lingzhen; Strapparava, Carlo; Sidorov, Grigori. - (2017), pp. 374-381. (Intervento presentato al convegno 12th Workshop on Innovative Use of NLP for Building Educational Applications tenutosi a Copenhagen, Denmark nel September) [10.18653/v1/W17-5042].

CIC-FBK Approach to Native Language Identification

Carlo Strapparava;
2017-01-01

Abstract

We present the CIC-FBK system, which took part in the Native Language Identification (NLI) Shared Task 2017. Our approach combines features commonly used in previous NLI research, i.e., word n-grams, lemma n-grams, part-of-speech n-grams, and function words, with recently introduced character n-grams from misspelled words, and features that are novel in this task, such as typed character n-grams, and syntactic n-grams of words and of syntactic relation tags. We use log-entropy weighting scheme and perform classification using the Support Vector Machines (SVM) algorithm. Our system achieved 0.8808 macro-averaged F1-score and shared the 1st rank in the NLI Shared Task 2017 scoring
2017
Proceedings of 12th Workshop on Innovative Use of NLP for Building Educational Applications
USA
Association for Computational Linguistics
978-1-945626-85-2
Markov, Ilia; Chen, Lingzhen; Strapparava, Carlo; Sidorov, Grigori
CIC-FBK Approach to Native Language Identification / Markov, Ilia; Chen, Lingzhen; Strapparava, Carlo; Sidorov, Grigori. - (2017), pp. 374-381. (Intervento presentato al convegno 12th Workshop on Innovative Use of NLP for Building Educational Applications tenutosi a Copenhagen, Denmark nel September) [10.18653/v1/W17-5042].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/343173
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 17
  • ???jsp.display-item.citation.isi??? ND
social impact