Recent advancements in AI, mainly through Large Language Models (LLMs), have transformed the field, driving both industrial and academic progress. These models are typically trained using causal language modeling tasks, which, while effective, face certain limitations compared to traditional machine learning frameworks. For example, they lack structured and easy-to-implement approaches to incorporate negative examples, making learning complex instructions or reasoning tasks challenging, such as distinguishing high-quality answers from suboptimal ones. In this Thesis, we propose novel methods to enhance the training of generative models using classifiers trained on positive and negative examples. Our first contribution is detailed in "Knowledge Transfer from Answer Ranking to Answer Generation", which Addresses the challenge of training generative QA models with limited supervised data. We demonstrate how knowledge from Answer Sentence Selection (AS2) models can be transferred to generative QA (GenQA) models by leveraging top-ranked candidates as generation targets and lower-ranked candidates as contextual input. Our approach further incorporates AS2 prediction scores for loss weighting and input/output shaping, significantly improving GenQA performance across multiple academic and industrial datasets. Building on this foundation, we extended our approach to automatic evaluation systems for QA. In particular, we designed SQuAre (Sentence-level QUestion AnsweRing Evaluation), an innovative reference-based metric that utilizes multiple positive and negative references for evaluating QA systems. As described in our paper, SQuAre consistently outperforms previous metrics in correlating with human evaluations, making it a reliable tool for assessing both extractive (AS2) and generative (GenQA) QA systems. Additionally, we explored the use of scores from automatic evaluation systems to further enhance generative QA models. Our research introduced strategies to transfer knowledge from QA evaluation models to generative models, including augmenting training data with QA-evaluated answers and weighted generator loss using evaluation scores. This approach, validated across several datasets, achieves state-of-the-art accuracy in answer generation. Finally, we contributed to retrieval-augmented generation (RAG) research by introducing Retrieval Complexity (RC), a novel metric that measures the difficulty of the question based on document completeness and retrieval performance. Our unsupervised RC estimation pipeline identifies complex question types, such as multi-hop, compositional, and temporal queries, enabling retrieval systems to better address high-difficulty questions and target areas for improvement.

Loss and Reward Functions for Generative Question Answering Systems / Gabburo, Matteo. - (2025 Apr 16), pp. 1-177.

Loss and Reward Functions for Generative Question Answering Systems

Gabburo, Matteo
2025-04-16

Abstract

Recent advancements in AI, mainly through Large Language Models (LLMs), have transformed the field, driving both industrial and academic progress. These models are typically trained using causal language modeling tasks, which, while effective, face certain limitations compared to traditional machine learning frameworks. For example, they lack structured and easy-to-implement approaches to incorporate negative examples, making learning complex instructions or reasoning tasks challenging, such as distinguishing high-quality answers from suboptimal ones. In this Thesis, we propose novel methods to enhance the training of generative models using classifiers trained on positive and negative examples. Our first contribution is detailed in "Knowledge Transfer from Answer Ranking to Answer Generation", which Addresses the challenge of training generative QA models with limited supervised data. We demonstrate how knowledge from Answer Sentence Selection (AS2) models can be transferred to generative QA (GenQA) models by leveraging top-ranked candidates as generation targets and lower-ranked candidates as contextual input. Our approach further incorporates AS2 prediction scores for loss weighting and input/output shaping, significantly improving GenQA performance across multiple academic and industrial datasets. Building on this foundation, we extended our approach to automatic evaluation systems for QA. In particular, we designed SQuAre (Sentence-level QUestion AnsweRing Evaluation), an innovative reference-based metric that utilizes multiple positive and negative references for evaluating QA systems. As described in our paper, SQuAre consistently outperforms previous metrics in correlating with human evaluations, making it a reliable tool for assessing both extractive (AS2) and generative (GenQA) QA systems. Additionally, we explored the use of scores from automatic evaluation systems to further enhance generative QA models. Our research introduced strategies to transfer knowledge from QA evaluation models to generative models, including augmenting training data with QA-evaluated answers and weighted generator loss using evaluation scores. This approach, validated across several datasets, achieves state-of-the-art accuracy in answer generation. Finally, we contributed to retrieval-augmented generation (RAG) research by introducing Retrieval Complexity (RC), a novel metric that measures the difficulty of the question based on document completeness and retrieval performance. Our unsupervised RC estimation pipeline identifies complex question types, such as multi-hop, compositional, and temporal queries, enabling retrieval systems to better address high-difficulty questions and target areas for improvement.
16-apr-2025
XXXVI
2023-2024
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Moschitti, Alessandro
no
ITALIA
Inglese
File in questo prodotto:
File Dimensione Formato  
phd_unitn_gabburo_matteo.pdf

accesso aperto

Descrizione: PhD Thesis of Matteo Gabburo
Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Creative commons
Dimensione 2.94 MB
Formato Adobe PDF
2.94 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/450810
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact