Natural Language Processing and Generation systems have recently shown the potential to complement and streamline the costly and time-consuming job of professional fact-checkers. In this work, we lift several constraints of current state-of-the-art pipelines for automated fact-checking based on the Retrieval-Augmented Generation (RAG) paradigm. Our goal is to benchmark, following professional fact-checking practices, RAG-based methods for the generation of verdicts - i.e., short texts discussing the veracity of a claim - evaluating them on stylistically complex claims and heterogeneous, yet reliable, knowledge bases. Our findings show a complex landscape, where, for example, LLM-based retrievers outperform other retrieval techniques, though they still struggle with heterogeneous knowledge bases; larger models excel in verdict faithfulness, while smaller models provide better context adherence, with human evaluations favouring zero-shot and one-shot approaches for informativeness, and fine-tuned models for emotional alignment.
Face the Facts! Evaluating RAG-based Pipelines for Professional Fact-Checking / Russo, Daniel; Menini, Stefano; Staiano, Jacopo; Guerini, Marco. - (2025), pp. 846-865. ( INLG 2025 Hanoi, Vietnam 29th October - 2nd November 2025).
Face the Facts! Evaluating RAG-based Pipelines for Professional Fact-Checking
Russo, DanielPrimo
;Staiano, JacopoPenultimo
;
2025-01-01
Abstract
Natural Language Processing and Generation systems have recently shown the potential to complement and streamline the costly and time-consuming job of professional fact-checkers. In this work, we lift several constraints of current state-of-the-art pipelines for automated fact-checking based on the Retrieval-Augmented Generation (RAG) paradigm. Our goal is to benchmark, following professional fact-checking practices, RAG-based methods for the generation of verdicts - i.e., short texts discussing the veracity of a claim - evaluating them on stylistically complex claims and heterogeneous, yet reliable, knowledge bases. Our findings show a complex landscape, where, for example, LLM-based retrievers outperform other retrieval techniques, though they still struggle with heterogeneous knowledge bases; larger models excel in verdict faithfulness, while smaller models provide better context adherence, with human evaluations favouring zero-shot and one-shot approaches for informativeness, and fine-tuned models for emotional alignment.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025.inlg-main.50.pdf
accesso aperto
Descrizione: paper
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
869.98 kB
Formato
Adobe PDF
|
869.98 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



