In this paper, we explore spelling errors as a source of information for detecting the native language of a writer, a previously under-explored area. We note that character n-grams from misspelled words are very indicative of the native language of the author. In combination with other lexical features, spelling error features lead to 1.2% improvement in accuracy on classifying texts in the TOEFL11 corpus by the author’s native language, compared to systems participating in the NLI shared task1 .

Improving Native Language Identification by Using Spelling Errors / Chen, Lingzhen; Strapparava, Carlo; Nastase, Vivi. - (2017), pp. 542-546. (Intervento presentato al convegno 55th annual meeting of the Association of Computational Linguistics (ACL-2017) tenutosi a Vancouver, Canada nel July-August) [10.18653/v1/P17-2086].

Improving Native Language Identification by Using Spelling Errors

Carlo Strapparava;
2017-01-01

Abstract

In this paper, we explore spelling errors as a source of information for detecting the native language of a writer, a previously under-explored area. We note that character n-grams from misspelled words are very indicative of the native language of the author. In combination with other lexical features, spelling error features lead to 1.2% improvement in accuracy on classifying texts in the TOEFL11 corpus by the author’s native language, compared to systems participating in the NLI shared task1 .
2017
Proceedings of the 55th annual meeting of the Association of Computational Linguistics (ACL-2017)
USA
Association for Computational Linguistics
978-1-945626-76-0
Chen, Lingzhen; Strapparava, Carlo; Nastase, Vivi
Improving Native Language Identification by Using Spelling Errors / Chen, Lingzhen; Strapparava, Carlo; Nastase, Vivi. - (2017), pp. 542-546. (Intervento presentato al convegno 55th annual meeting of the Association of Computational Linguistics (ACL-2017) tenutosi a Vancouver, Canada nel July-August) [10.18653/v1/P17-2086].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/343181
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 4
  • OpenAlex ND
social impact