Improving Native Language Identification by Using Spelling Errors

IRIS

In this paper, we explore spelling errors as a source of information for detecting the native language of a writer, a previously under-explored area. We note that character n-grams from misspelled words are very indicative of the native language of the author. In combination with other lexical features, spelling error features lead to 1.2% improvement in accuracy on classifying texts in the TOEFL11 corpus by the author’s native language, compared to systems participating in the NLI shared task1 .

Improving Native Language Identification by Using Spelling Errors / Chen, Lingzhen; Strapparava, Carlo; Nastase, Vivi. - (2017), pp. 542-546. ( 55th annual meeting of the Association of Computational Linguistics (ACL-2017) Vancouver, Canada July-August) [10.18653/v1/P17-2086].

Improving Native Language Identification by Using Spelling Errors

Lingzhen Chen;Carlo Strapparava;Vivi Nastase

2017-01-01

Abstract

In this paper, we explore spelling errors as a source of information for detecting the native language of a writer, a previously under-explored area. We note that character n-grams from misspelled words are very indicative of the native language of the author. In combination with other lexical features, spelling error features lead to 1.2% improvement in accuracy on classifying texts in the TOEFL11 corpus by the author’s native language, compared to systems participating in the NLI shared task1 .

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2017
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 55th annual meeting of the Association of Computational Linguistics (ACL-2017)
			
	Luogo di edizione (Place of publication)
	
				USA
			
	Casa editrice (Publisher)
	
				Association for Computational Linguistics
			
	ISBN
	
				978-1-945626-76-0
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85040588169
			
	Codice WOS (WOS identifier)
	
				WOS:000493992300086
			
	Tutti gli autori
	
						Chen, Lingzhen; Strapparava, Carlo; Nastase, Vivi
					
	Citazione
	
				Improving Native Language Identification by Using Spelling Errors / Chen, Lingzhen; Strapparava, Carlo; Nastase, Vivi. - (2017), pp. 542-546. ( 55th annual meeting of the Association of Computational Linguistics (ACL-2017) Vancouver, Canada July-August) [10.18653/v1/P17-2086].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/343181

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

12

6

ND

social impact