The swift advancement of Large Language Models (LLMs) has led to their widespread use across various tasks and domains, demonstrating remarkable generalization capabilities. However, achieving optimal performance in specialized tasks often requires fine-tuning LLMs with task-specific resources. The creation of high-quality, human-annotated datasets for this purpose is challenging due to financial constraints and the limited availability of human experts. To address these limitations, we propose First-AID, a novel human-in-theloop (HITL) data collection framework for the knowledge-driven generation of synthetic dialogues using LLM prompting. In particular, our framework implements different strategies of data collection that require different user intervention during dialogue generation to reduce post-editing efforts and enhance the quality of generated dialogues. We also evaluated First-AID on misinformation and hate countering dialogues collection, demonstrating (1) its potential for efficient and high-quality data generation and (2) its adaptability to different practical constraints thanks to the three data
First-AID: the first Annotation Interface for grounded Dialogues / Menini, Stefano; Russo, Daniel; Palmero Aprosio, Alessio; Guerini, Marco. - (2025), pp. 563-571. ( 63rd Annual Meeting of the Association for Computational Linguistics Vienna, Austria July 27–August 1st, 2025) [10.18653/v1/2025.acl-demo.54].
First-AID: the first Annotation Interface for grounded Dialogues
Menini Stefano;Russo Daniel;Aprosio Alessio Palmero;Guerini Marco
2025-01-01
Abstract
The swift advancement of Large Language Models (LLMs) has led to their widespread use across various tasks and domains, demonstrating remarkable generalization capabilities. However, achieving optimal performance in specialized tasks often requires fine-tuning LLMs with task-specific resources. The creation of high-quality, human-annotated datasets for this purpose is challenging due to financial constraints and the limited availability of human experts. To address these limitations, we propose First-AID, a novel human-in-theloop (HITL) data collection framework for the knowledge-driven generation of synthetic dialogues using LLM prompting. In particular, our framework implements different strategies of data collection that require different user intervention during dialogue generation to reduce post-editing efforts and enhance the quality of generated dialogues. We also evaluated First-AID on misinformation and hate countering dialogues collection, demonstrating (1) its potential for efficient and high-quality data generation and (2) its adaptability to different practical constraints thanks to the three dataI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



