Large Language Models Under Evaluation: An Acceptability, Complexity and Coherence Assessment in Italian

Chesi, Cristiano; Vespignani, Francesco; Zamparelli, Roberto

This paper discusses the results of various experiments assessing the morphosyntactic and semantic competence in Italian of four very large language models (vLLMs): davinci (GPT-3/ChatGPT), davinci-002, davinci-003 (both GPT-3.5 models) and gpt-4-1106-preview (GPT-4). We evaluated these models on (i) acceptability, (ii) complexity, and (iii) coherence judgments using 7-point Likert scales and on (iv) syntactic development through a forced choice task. The test sets were drawn from shared NLP tasks and standard linguistic assessments. The results suggest that, although fine-tuned transformers outperform all GPT models, GPT-4 represents a significant improvement over third-generation GPT models. According to our tests, even if GPT-4 and fine-tuned transformers cannot be considered descriptively or explanatorily adequate, they nonetheless pose a challenge to the poverty of the stimulus hypothesis. The "theory" expressed by GPT models is not linguistically intelligible in any relevant sense, and their training data is orders of magnitude larger than the primary linguistic input available to children. Nevertheless, GPT-4 captures certain generalizations, such as the constraints blocking the insertion of an overt resumptive clitic in specific gap positions, that are arguably unlearnable from just primary positive data.

Large Language Models Under Evaluation: An Acceptability, Complexity and Coherence Assessment in Italian / Chesi, C., Vespignani, F., Zamparelli, R.. - In: IJCOL. - ISSN 2499-4553. - ELETTRONICO. - 11:2(2026), pp. 77-98.

Large Language Models Under Evaluation: An Acceptability, Complexity and Coherence Assessment in Italian

Cristiano Chesi;Francesco Vespignani;Roberto Zamparelli

2026-01-01

Abstract

This paper discusses the results of various experiments assessing the morphosyntactic and semantic competence in Italian of four very large language models (vLLMs): davinci (GPT-3/ChatGPT), davinci-002, davinci-003 (both GPT-3.5 models) and gpt-4-1106-preview (GPT-4). We evaluated these models on (i) acceptability, (ii) complexity, and (iii) coherence judgments using 7-point Likert scales and on (iv) syntactic development through a forced choice task. The test sets were drawn from shared NLP tasks and standard linguistic assessments. The results suggest that, although fine-tuned transformers outperform all GPT models, GPT-4 represents a significant improvement over third-generation GPT models. According to our tests, even if GPT-4 and fine-tuned transformers cannot be considered descriptively or explanatorily adequate, they nonetheless pose a challenge to the poverty of the stimulus hypothesis. The "theory" expressed by GPT models is not linguistically intelligible in any relevant sense, and their training data is orders of magnitude larger than the primary linguistic input available to children. Nevertheless, GPT-4 captures certain generalizations, such as the constraints blocking the insertion of an overt resumptive clitic in specific gap positions, that are arguably unlearnable from just primary positive data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2026
			
	Titolo del periodico (Journal title)
	
				IJCOL
			
	Numero e parte del fascicolo (Issue number and part)
	
				2
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore L-LIN/01 - Glottologia e Linguistica
			
	Settori scientifico-disciplinari (validi dal 09/05/2024) - Reference SSD (valid from 09/05/2024)
	
				Settore GLOT-01/A - Glottologia e linguistica
			
	Tutti gli autori
	
						Chesi, Cristiano; Vespignani, Francesco; Zamparelli, Roberto
					
	Citazione
	
				Large Language Models Under Evaluation: An Acceptability, Complexity and Coherence Assessment in Italian / Chesi, C., Vespignani, F., Zamparelli, R.. - In: IJCOL. - ISSN 2499-4553. - ELETTRONICO. - 11:2(2026), pp. 77-98.
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
IJCOL_11_2_5_chesi_et_al.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 1.56 MB Formato Adobe PDF Visualizza/Apri	1.56 MB	Adobe PDF	Visualizza/Apri