Concept Bottleneck Models (CBMs) are interpretable machine learning models that ground their predictions on human-understandable concepts, allowing for targeted interventions in their decision-making process. However, when intervened on, CBMs assume the availability of humans that can identify the need to intervene and always provide correct interventions. Both assumptions are unrealistic and impractical, considering labor costs and human error-proneness. In contrast, Learning to Defer (L2D) extends supervised learning by allowing machine learning models to identify cases where a human is more likely to be correct than the model, thus leading to deferring systems with improved performance. In this work, we gain inspiration from L2D and propose Deferring CBMs (DCBMs), a novel framework that allows CBMs to learn when an intervention is needed. To this end, we model DCBMs as a composition of deferring systems and derive a consistent L2D loss to train them. Moreover, by relying on a CBM architecture, DCBMs can explain the reasons for deferring on the final task. Our results show that DCBMs can achieve high predictive performance and interpretability by deferring only when needed.

Deferring Concept Bottleneck Models: Learning to Defer Interventions to Inaccurate Experts / Pugnana, Andrea; Massidda, Riccardo; Giannini, Francesco; Barbiero, Pietro; Espinosa Zarlenga, Mateo; Pellungrini, Roberto; Dominici, Gabriele; Giannotti, Fosca; Bacciu, Davide. - (2026). ( NeurIPS San Diego, California 2nd - 7th December 2025).

Deferring Concept Bottleneck Models: Learning to Defer Interventions to Inaccurate Experts

Pugnana, Andrea
Co-primo
;
Giannotti, Fosca;
2026-01-01

Abstract

Concept Bottleneck Models (CBMs) are interpretable machine learning models that ground their predictions on human-understandable concepts, allowing for targeted interventions in their decision-making process. However, when intervened on, CBMs assume the availability of humans that can identify the need to intervene and always provide correct interventions. Both assumptions are unrealistic and impractical, considering labor costs and human error-proneness. In contrast, Learning to Defer (L2D) extends supervised learning by allowing machine learning models to identify cases where a human is more likely to be correct than the model, thus leading to deferring systems with improved performance. In this work, we gain inspiration from L2D and propose Deferring CBMs (DCBMs), a novel framework that allows CBMs to learn when an intervention is needed. To this end, we model DCBMs as a composition of deferring systems and derive a consistent L2D loss to train them. Moreover, by relying on a CBM architecture, DCBMs can explain the reasons for deferring on the final task. Our results show that DCBMs can achieve high predictive performance and interpretability by deferring only when needed.
2026
39th Conference on Neural Information Processing Systems (NeurIPS 2025)
-
Neural Information Processing Systems Foundation, Inc. (NeurIPS)
Pugnana, Andrea; Massidda, Riccardo; Giannini, Francesco; Barbiero, Pietro; Espinosa Zarlenga, Mateo; Pellungrini, Roberto; Dominici, Gabriele; Gianno...espandi
Deferring Concept Bottleneck Models: Learning to Defer Interventions to Inaccurate Experts / Pugnana, Andrea; Massidda, Riccardo; Giannini, Francesco; Barbiero, Pietro; Espinosa Zarlenga, Mateo; Pellungrini, Roberto; Dominici, Gabriele; Giannotti, Fosca; Bacciu, Davide. - (2026). ( NeurIPS San Diego, California 2nd - 7th December 2025).
File in questo prodotto:
File Dimensione Formato  
5559_Deferring_Concept_Bottlen.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 3.01 MB
Formato Adobe PDF
3.01 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/481153
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact