Visual Question Answering (VQA) has achieved significant success over the last few years, while most studies focus on training a VQA model on a stationary domain (e.g., a given dataset). In real-world application scenarios, however, these methods are often inefficient because VQA systems are always supposed to extend their knowledge and meet the ever-changing demands of users. In this paper, we introduce a new and challenging multi-domain lifelong VQA task, dubbed MDL-VQA, which encourages the VQA model to continuously learn across multiple domains while mitigating the forgetting on previously-learned domains. Furthermore, we propose a novel replay-free Self-Critical Distillation (SCD) framework tailor-made for MDL-VQA, which alleviates forgetting issue via transferring previous-domain knowledge from teacher to student models. First, we propose to introspect the teacher’s understanding over original and counterfactual samples, thereby creating informative instance-relevant and domain-relevant knowledge for logits-based distillation. Second, on the side of feature-based distillation, we propose to introspect the reasoning behavior of student model to establish the harmful domain-specific knowledge acquired in current domain, and further leverage the metric learning strategy to encourage student to learn useful knowledge in new domain. Extensive experiments demonstrate that SCD framework outperforms state-of-the-art competitors with different training orders.

Multi-Domain Lifelong Visual Question Answering via Self-Critical Distillation / Lao, Mingrui; Pu, Nan; Liu, Yu; Zhong, Zhun; Bakker, Erwin M.; Sebe, Nicu; Lew, Michael S.. - (2023), pp. 4747-4758. (Intervento presentato al convegno The 31st ACM International Conference on Multimedia tenutosi a Ottawa, Canada nel 29 October - 3 November 2023) [10.1145/3581783.3612121].

Multi-Domain Lifelong Visual Question Answering via Self-Critical Distillation

Pu, Nan;Zhong, Zhun;Sebe, Nicu;
2023-01-01

Abstract

Visual Question Answering (VQA) has achieved significant success over the last few years, while most studies focus on training a VQA model on a stationary domain (e.g., a given dataset). In real-world application scenarios, however, these methods are often inefficient because VQA systems are always supposed to extend their knowledge and meet the ever-changing demands of users. In this paper, we introduce a new and challenging multi-domain lifelong VQA task, dubbed MDL-VQA, which encourages the VQA model to continuously learn across multiple domains while mitigating the forgetting on previously-learned domains. Furthermore, we propose a novel replay-free Self-Critical Distillation (SCD) framework tailor-made for MDL-VQA, which alleviates forgetting issue via transferring previous-domain knowledge from teacher to student models. First, we propose to introspect the teacher’s understanding over original and counterfactual samples, thereby creating informative instance-relevant and domain-relevant knowledge for logits-based distillation. Second, on the side of feature-based distillation, we propose to introspect the reasoning behavior of student model to establish the harmful domain-specific knowledge acquired in current domain, and further leverage the metric learning strategy to encourage student to learn useful knowledge in new domain. Extensive experiments demonstrate that SCD framework outperforms state-of-the-art competitors with different training orders.
2023
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
New York, NY, United States
Association for Computing Machinery, Inc
9798400701085
Lao, Mingrui; Pu, Nan; Liu, Yu; Zhong, Zhun; Bakker, Erwin M.; Sebe, Nicu; Lew, Michael S.
Multi-Domain Lifelong Visual Question Answering via Self-Critical Distillation / Lao, Mingrui; Pu, Nan; Liu, Yu; Zhong, Zhun; Bakker, Erwin M.; Sebe, Nicu; Lew, Michael S.. - (2023), pp. 4747-4758. (Intervento presentato al convegno The 31st ACM International Conference on Multimedia tenutosi a Ottawa, Canada nel 29 October - 3 November 2023) [10.1145/3581783.3612121].
File in questo prodotto:
File Dimensione Formato  
3581783.3612121 (2)-compressed.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 473.55 kB
Formato Adobe PDF
473.55 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/398407
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact