In this paper, we explore spelling errors as a source of information for detecting the native language of a writer, a previously under-explored area. We note that character n-grams from misspelled words are very indicative of the native language of the author. In combination with other lexical features, spelling error features lead to 1.2% improvement in accuracy on classifying texts in the TOEFL11 corpus by the author’s native language, compared to systems participating in the NLI shared task1 .
Improving Native Language Identification by Using Spelling Errors / Chen, Lingzhen; Strapparava, Carlo; Nastase, Vivi. - (2017), pp. 542-546. (Intervento presentato al convegno 55th annual meeting of the Association of Computational Linguistics (ACL-2017) tenutosi a Vancouver, Canada nel July-August) [10.18653/v1/P17-2086].
Improving Native Language Identification by Using Spelling Errors
Carlo Strapparava;
2017-01-01
Abstract
In this paper, we explore spelling errors as a source of information for detecting the native language of a writer, a previously under-explored area. We note that character n-grams from misspelled words are very indicative of the native language of the author. In combination with other lexical features, spelling error features lead to 1.2% improvement in accuracy on classifying texts in the TOEFL11 corpus by the author’s native language, compared to systems participating in the NLI shared task1 .I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione