Loss and Reward Functions for Generative Question Answering Systems

Gabburo, Matteo

Recent advancements in AI, mainly through Large Language Models (LLMs), have transformed the field, driving both industrial and academic progress. These models are typically trained using causal language modeling tasks, which, while effective, face certain limitations compared to traditional machine learning frameworks. For example, they lack structured and easy-to-implement approaches to incorporate negative examples, making learning complex instructions or reasoning tasks challenging, such as distinguishing high-quality answers from suboptimal ones. In this Thesis, we propose novel methods to enhance the training of generative models using classifiers trained on positive and negative examples. Our first contribution is detailed in "Knowledge Transfer from Answer Ranking to Answer Generation", which Addresses the challenge of training generative QA models with limited supervised data. We demonstrate how knowledge from Answer Sentence Selection (AS2) models can be transferred to generative QA (GenQA) models by leveraging top-ranked candidates as generation targets and lower-ranked candidates as contextual input. Our approach further incorporates AS2 prediction scores for loss weighting and input/output shaping, significantly improving GenQA performance across multiple academic and industrial datasets. Building on this foundation, we extended our approach to automatic evaluation systems for QA. In particular, we designed SQuAre (Sentence-level QUestion AnsweRing Evaluation), an innovative reference-based metric that utilizes multiple positive and negative references for evaluating QA systems. As described in our paper, SQuAre consistently outperforms previous metrics in correlating with human evaluations, making it a reliable tool for assessing both extractive (AS2) and generative (GenQA) QA systems. Additionally, we explored the use of scores from automatic evaluation systems to further enhance generative QA models. Our research introduced strategies to transfer knowledge from QA evaluation models to generative models, including augmenting training data with QA-evaluated answers and weighted generator loss using evaluation scores. This approach, validated across several datasets, achieves state-of-the-art accuracy in answer generation. Finally, we contributed to retrieval-augmented generation (RAG) research by introducing Retrieval Complexity (RC), a novel metric that measures the difficulty of the question based on document completeness and retrieval performance. Our unsupervised RC estimation pipeline identifies complex question types, such as multi-hop, compositional, and temporal queries, enabling retrieval systems to better address high-difficulty questions and target areas for improvement.

Loss and Reward Functions for Generative Question Answering Systems / Gabburo, Matteo. - (2025 Apr 16), pp. 1-177.

Loss and Reward Functions for Generative Question Answering Systems

Gabburo, Matteo

2025-04-16

Abstract

Recent advancements in AI, mainly through Large Language Models (LLMs), have transformed the field, driving both industrial and academic progress. These models are typically trained using causal language modeling tasks, which, while effective, face certain limitations compared to traditional machine learning frameworks. For example, they lack structured and easy-to-implement approaches to incorporate negative examples, making learning complex instructions or reasoning tasks challenging, such as distinguishing high-quality answers from suboptimal ones. In this Thesis, we propose novel methods to enhance the training of generative models using classifiers trained on positive and negative examples. Our first contribution is detailed in "Knowledge Transfer from Answer Ranking to Answer Generation", which Addresses the challenge of training generative QA models with limited supervised data. We demonstrate how knowledge from Answer Sentence Selection (AS2) models can be transferred to generative QA (GenQA) models by leveraging top-ranked candidates as generation targets and lower-ranked candidates as contextual input. Our approach further incorporates AS2 prediction scores for loss weighting and input/output shaping, significantly improving GenQA performance across multiple academic and industrial datasets. Building on this foundation, we extended our approach to automatic evaluation systems for QA. In particular, we designed SQuAre (Sentence-level QUestion AnsweRing Evaluation), an innovative reference-based metric that utilizes multiple positive and negative references for evaluating QA systems. As described in our paper, SQuAre consistently outperforms previous metrics in correlating with human evaluations, making it a reliable tool for assessing both extractive (AS2) and generative (GenQA) QA systems. Additionally, we explored the use of scores from automatic evaluation systems to further enhance generative QA models. Our research introduced strategies to transfer knowledge from QA evaluation models to generative models, including augmenting training data with QA-evaluated answers and weighted generator loss using evaluation scores. This approach, validated across several datasets, achieves state-of-the-art accuracy in answer generation. Finally, we contributed to retrieval-augmented generation (RAG) research by introducing Retrieval Complexity (RC), a novel metric that measures the difficulty of the question based on document completeness and retrieval performance. Our unsupervised RC estimation pipeline identifies complex question types, such as multi-hop, compositional, and temporal queries, enabling retrieval systems to better address high-difficulty questions and target areas for improvement.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				16-apr-2025
			
	Ciclo
	
				XXXVI
			
	Anno Accademico
	
				2023-2024
			
	Dipartimento
	
				Ingegneria e scienza dell'Informaz (29/10/12-)
			
	Corso di dottorato
	
				Information and Communication Technology
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Moschitti, Alessandro
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Paese dell'Istituzione/ente esterno in caso di cotutela o collaborazioni internazionali (Country of the Institution in case of bi-nationally supervised PhD thesis or other international collaborations).
	
				ITALIA
			
	Lingua (Language)
	
				Inglese
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
phd_unitn_gabburo_matteo.pdf accesso aperto Descrizione: PhD Thesis of Matteo Gabburo Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Creative commons Dimensione 2.94 MB Formato Adobe PDF Visualizza/Apri	2.94 MB	Adobe PDF	Visualizza/Apri