Large Language Models (LLMs) demonstrate an impressive capacity to recall a vast range of factual knowledge. However, understanding their underlying reasoning and internal mechanisms in exploiting this knowledge remains a key research area. This work unveils the factual information an LLM represents internally for sentence-level claim verification. We propose an end-to-end framework to decode factual knowledge embedded in token representations from a vector space to a set of ground predicates, showing its layer-wise evolution using a dynamic knowledge graph. Our framework employs activation patching, a vector-level technique that alters a token representation during inference, to extract encoded knowledge. Accordingly, we neither rely on training nor external models. Using factual and common-sense claims from two claim verification datasets, we showcase interpretability analyses at local and global levels. The local analysis highlights entity centrality in LLM reasoning, from claim-related information and multi-hop reasoning to representation errors causing erroneous evaluation. On the other hand, the global reveals trends in the underlying evolution, such as word-based knowledge evolving into claim-related facts. By interpreting semantics from LLM latent representations and enabling graph-related analyses, this work enhances the understanding of the factual knowledge resolution process.

Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph / Bronzini, Marco; Nicolini, Carlo; Lepri, Bruno; Staiano, Jacopo; Passerini, Andrea. - ELETTRONICO. - (2024). ( COLM 2024 Philadelphia, Pennsylvania, USA 7th October-9th October 2024).

Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph

Bronzini, Marco
;
Lepri, Bruno;Staiano, Jacopo
Co-ultimo
;
Passerini, Andrea
2024-01-01

Abstract

Large Language Models (LLMs) demonstrate an impressive capacity to recall a vast range of factual knowledge. However, understanding their underlying reasoning and internal mechanisms in exploiting this knowledge remains a key research area. This work unveils the factual information an LLM represents internally for sentence-level claim verification. We propose an end-to-end framework to decode factual knowledge embedded in token representations from a vector space to a set of ground predicates, showing its layer-wise evolution using a dynamic knowledge graph. Our framework employs activation patching, a vector-level technique that alters a token representation during inference, to extract encoded knowledge. Accordingly, we neither rely on training nor external models. Using factual and common-sense claims from two claim verification datasets, we showcase interpretability analyses at local and global levels. The local analysis highlights entity centrality in LLM reasoning, from claim-related information and multi-hop reasoning to representation errors causing erroneous evaluation. On the other hand, the global reveals trends in the underlying evolution, such as word-based knowledge evolving into claim-related facts. By interpreting semantics from LLM latent representations and enabling graph-related analyses, this work enhances the understanding of the factual knowledge resolution process.
2024
First Conference on Language Modeling
Philadelphia, Pennsylvania, United States
Open Review
Bronzini, Marco; Nicolini, Carlo; Lepri, Bruno; Staiano, Jacopo; Passerini, Andrea
Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph / Bronzini, Marco; Nicolini, Carlo; Lepri, Bruno; Staiano, Jacopo; Passerini, Andrea. - ELETTRONICO. - (2024). ( COLM 2024 Philadelphia, Pennsylvania, USA 7th October-9th October 2024).
File in questo prodotto:
File Dimensione Formato  
LLaMechanix_COLM_2024.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 8.86 MB
Formato Adobe PDF
8.86 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/425730
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact