After the public release of ChatGPT (November 30th, 2022) and consequently, that of all its competitors, the use of Large Language Models (LLMs) has become widespread among the public. The most significant impact was perceived from the very beginning in the field of Education and Instruction [1, 2, 3, 4, 5, 6, 7]. Of particular interest for this paper is its use both by teachers and students in particular in the context of higher education [8, 4, 9]. The immediacy with which Large Language Models (LLMs) have been integrated into higher education practices, both by teachers and students, leads to questions of fundamental importance relating to their effectiveness and reliability. In this field, LLMs become the means through which teachers have the opportunity to revolutionise the interaction with students, the management of workload and the personalisation of each learning experience [2]. Although these technologies are recognised as having advantages and potential for improving learning in terms of accessibility and personalisation [7], a crucial question concerns their application in assessment practices, especially the ability to objectively and impartially evaluate students’ performance. The possibilities of using these tools in the field of learning evaluation is relatively little known, which implies the need to delve deeper into the topic for its application both in pedagogical theory and in educational practice. A previous study has been already published [10] which explored the use of the main LLM in the specific context of assessing students’ papers, and this is a replication study based on it. The purpose of the current study is to explore the possible use of the main LLMs in the specific context of evaluating students’ written productions, with a focus on the aspects of accuracy that are evaluated with the help of a rubric proposed by the teacher. This article is part of a series of contributions that focus on this topic, in light of the principles and application of the AI-Mediated Assessment for Academics and Students (AI-MAAS) model [11].
Large Language Models for the Assessment of Students’ Authentic Tasks: A Replication Study in Higher Education / Agostini, Daniele; Picasso, Federica; Ballardini, Helga. - ELETTRONICO. - 3879:(2024). (Intervento presentato al convegno AIxEDU 2024 - Artificial INtelligent Systems in Education 2024 tenutosi a Bolzano, Italy nel 26th November 2024).
Large Language Models for the Assessment of Students’ Authentic Tasks: A Replication Study in Higher Education
Agostini, Daniele
Primo
;Picasso, FedericaSecondo
;Ballardini, HelgaUltimo
2024-01-01
Abstract
After the public release of ChatGPT (November 30th, 2022) and consequently, that of all its competitors, the use of Large Language Models (LLMs) has become widespread among the public. The most significant impact was perceived from the very beginning in the field of Education and Instruction [1, 2, 3, 4, 5, 6, 7]. Of particular interest for this paper is its use both by teachers and students in particular in the context of higher education [8, 4, 9]. The immediacy with which Large Language Models (LLMs) have been integrated into higher education practices, both by teachers and students, leads to questions of fundamental importance relating to their effectiveness and reliability. In this field, LLMs become the means through which teachers have the opportunity to revolutionise the interaction with students, the management of workload and the personalisation of each learning experience [2]. Although these technologies are recognised as having advantages and potential for improving learning in terms of accessibility and personalisation [7], a crucial question concerns their application in assessment practices, especially the ability to objectively and impartially evaluate students’ performance. The possibilities of using these tools in the field of learning evaluation is relatively little known, which implies the need to delve deeper into the topic for its application both in pedagogical theory and in educational practice. A previous study has been already published [10] which explored the use of the main LLM in the specific context of assessing students’ papers, and this is a replication study based on it. The purpose of the current study is to explore the possible use of the main LLMs in the specific context of evaluating students’ written productions, with a focus on the aspects of accuracy that are evaluated with the help of a rubric proposed by the teacher. This article is part of a series of contributions that focus on this topic, in light of the principles and application of the AI-Mediated Assessment for Academics and Students (AI-MAAS) model [11].I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione