Natural Language Processing and Generation systems have recently shown the potential to complement and streamline the costly and time-consuming job of professional fact-checkers. In this work, we lift several constraints of current state-of-the-art pipelines for automated fact-checking based on the Retrieval-Augmented Generation (RAG) paradigm. Our goal is to benchmark, following professional fact-checking practices, RAG-based methods for the generation of verdicts - i.e., short texts discussing the veracity of a claim - evaluating them on stylistically complex claims and heterogeneous, yet reliable, knowledge bases. Our findings show a complex landscape, where, for example, LLM-based retrievers outperform other retrieval techniques, though they still struggle with heterogeneous knowledge bases; larger models excel in verdict faithfulness, while smaller models provide better context adherence, with human evaluations favouring zero-shot and one-shot approaches for informativeness, and fine-tuned models for emotional alignment.

Face the Facts! Evaluating RAG-based Pipelines for Professional Fact-Checking / Russo, Daniel; Menini, Stefano; Staiano, Jacopo; Guerini, Marco. - (2025), pp. 846-865. ( INLG 2025 Hanoi, Vietnam 29th October - 2nd November 2025).

Face the Facts! Evaluating RAG-based Pipelines for Professional Fact-Checking

Russo, Daniel
Primo
;
Staiano, Jacopo
Penultimo
;
2025-01-01

Abstract

Natural Language Processing and Generation systems have recently shown the potential to complement and streamline the costly and time-consuming job of professional fact-checkers. In this work, we lift several constraints of current state-of-the-art pipelines for automated fact-checking based on the Retrieval-Augmented Generation (RAG) paradigm. Our goal is to benchmark, following professional fact-checking practices, RAG-based methods for the generation of verdicts - i.e., short texts discussing the veracity of a claim - evaluating them on stylistically complex claims and heterogeneous, yet reliable, knowledge bases. Our findings show a complex landscape, where, for example, LLM-based retrievers outperform other retrieval techniques, though they still struggle with heterogeneous knowledge bases; larger models excel in verdict faithfulness, while smaller models provide better context adherence, with human evaluations favouring zero-shot and one-shot approaches for informativeness, and fine-tuned models for emotional alignment.
2025
Proceedings of the 18th International Natural Language Generation Conference
Stroudsburg, PA, USA
Association for Computational Linguistics
979-8-89176-321-0
Russo, Daniel; Menini, Stefano; Staiano, Jacopo; Guerini, Marco
Face the Facts! Evaluating RAG-based Pipelines for Professional Fact-Checking / Russo, Daniel; Menini, Stefano; Staiano, Jacopo; Guerini, Marco. - (2025), pp. 846-865. ( INLG 2025 Hanoi, Vietnam 29th October - 2nd November 2025).
File in questo prodotto:
File Dimensione Formato  
2025.inlg-main.50.pdf

accesso aperto

Descrizione: paper
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 869.98 kB
Formato Adobe PDF
869.98 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/467654
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact