We present the CIC-FBK system, which took part in the Native Language Identification (NLI) Shared Task 2017. Our approach combines features commonly used in previous NLI research, i.e., word n-grams, lemma n-grams, part-of-speech n-grams, and function words, with recently introduced character n-grams from misspelled words, and features that are novel in this task, such as typed character n-grams, and syntactic n-grams of words and of syntactic relation tags. We use log-entropy weighting scheme and perform classification using the Support Vector Machines (SVM) algorithm. Our system achieved 0.8808 macro-averaged F1-score and shared the 1st rank in the NLI Shared Task 2017 scoring
CIC-FBK Approach to Native Language Identification / Markov, Ilia; Chen, Lingzhen; Strapparava, Carlo; Sidorov, Grigori. - (2017), pp. 374-381. (Intervento presentato al convegno 12th Workshop on Innovative Use of NLP for Building Educational Applications tenutosi a Copenhagen, Denmark nel September) [10.18653/v1/W17-5042].
CIC-FBK Approach to Native Language Identification
Carlo Strapparava;
2017-01-01
Abstract
We present the CIC-FBK system, which took part in the Native Language Identification (NLI) Shared Task 2017. Our approach combines features commonly used in previous NLI research, i.e., word n-grams, lemma n-grams, part-of-speech n-grams, and function words, with recently introduced character n-grams from misspelled words, and features that are novel in this task, such as typed character n-grams, and syntactic n-grams of words and of syntactic relation tags. We use log-entropy weighting scheme and perform classification using the Support Vector Machines (SVM) algorithm. Our system achieved 0.8808 macro-averaged F1-score and shared the 1st rank in the NLI Shared Task 2017 scoringI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione