Exploiting native language interference for native language identification

IRIS

Native language identification (NLI) - the task of automatically identifying the native language (L1) of persons based on their writings in the second language (L2) - is based on the hypothesis that characteristics of L1 will surface and interfere in the production of texts in L2 to the extent that L1 is identifiable. We present an in-depth investigation of features that model a variety of linguistic phenomena potentially involved in native language interference in the context of the NLI task: the languages' structuring of information through punctuation usage, emotion expression in language, and similarities of form with the L1 vocabulary through the use of anglicized words, cognates, and other misspellings. The results of experiments with different combinations of features in a variety of settings allow us to quantify the native language interference value of these linguistic phenomena and show how robust they are in cross-corpus experiments and with respect to proficiency in L2. These experiments provide a deeper insight into the NLI task, showing how native language interference explains the gap between baseline, corpus-independent features, and the state of the art that relies on features/representations that cover (indiscriminately) a variety of linguistic phenomena.

Exploiting native language interference for native language identification / Markov, I., Nastase, V., Strapparava, C.. - In: NATURAL LANGUAGE ENGINEERING. - ISSN 1351-3249. - 28:2(2022), pp. 167-197. [10.1017/S1351324920000595]

Exploiting native language interference for native language identification

Markov I.;Nastase V.;Strapparava C.

2022-01-01

Abstract

Native language identification (NLI) - the task of automatically identifying the native language (L1) of persons based on their writings in the second language (L2) - is based on the hypothesis that characteristics of L1 will surface and interfere in the production of texts in L2 to the extent that L1 is identifiable. We present an in-depth investigation of features that model a variety of linguistic phenomena potentially involved in native language interference in the context of the NLI task: the languages' structuring of information through punctuation usage, emotion expression in language, and similarities of form with the L1 vocabulary through the use of anglicized words, cognates, and other misspellings. The results of experiments with different combinations of features in a variety of settings allow us to quantify the native language interference value of these linguistic phenomena and show how robust they are in cross-corpus experiments and with respect to proficiency in L2. These experiments provide a deeper insight into the NLI task, showing how native language interference explains the gap between baseline, corpus-independent features, and the state of the art that relies on features/representations that cover (indiscriminately) a variety of linguistic phenomena.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del periodico (Journal title)
	
				NATURAL LANGUAGE ENGINEERING
			
	Numero e parte del fascicolo (Issue number and part)
	
				2
			
	DOI
	
				https://dx.doi.org/10.1017/S1351324920000595
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85096898655
			
	Codice WOS (WOS identifier)
	
				WOS:000752065500004
			
	Tutti gli autori
	
						Markov, I.; Nastase, V.; Strapparava, C.
					
	Citazione
	
				Exploiting native language interference for native language identification / Markov, I., Nastase, V., Strapparava, C.. - In: NATURAL LANGUAGE ENGINEERING. - ISSN 1351-3249. - 28:2(2022), pp. 167-197. [10.1017/S1351324920000595]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/341937

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

6

4

3

social impact