Multi-Domain Lifelong Visual Question Answering via Self-Critical Distillation

IRIS

Visual Question Answering (VQA) has achieved significant success over the last few years, while most studies focus on training a VQA model on a stationary domain (e.g., a given dataset). In real-world application scenarios, however, these methods are often inefficient because VQA systems are always supposed to extend their knowledge and meet the ever-changing demands of users. In this paper, we introduce a new and challenging multi-domain lifelong VQA task, dubbed MDL-VQA, which encourages the VQA model to continuously learn across multiple domains while mitigating the forgetting on previously-learned domains. Furthermore, we propose a novel replay-free Self-Critical Distillation (SCD) framework tailor-made for MDL-VQA, which alleviates forgetting issue via transferring previous-domain knowledge from teacher to student models. First, we propose to introspect the teacher’s understanding over original and counterfactual samples, thereby creating informative instance-relevant and domain-relevant knowledge for logits-based distillation. Second, on the side of feature-based distillation, we propose to introspect the reasoning behavior of student model to establish the harmful domain-specific knowledge acquired in current domain, and further leverage the metric learning strategy to encourage student to learn useful knowledge in new domain. Extensive experiments demonstrate that SCD framework outperforms state-of-the-art competitors with different training orders.

Multi-Domain Lifelong Visual Question Answering via Self-Critical Distillation / Lao, Mingrui; Pu, Nan; Liu, Yu; Zhong, Zhun; Bakker, Erwin M.; Sebe, Nicu; Lew, Michael S.. - (2023), pp. 4747-4758. (Intervento presentato al convegno The 31st ACM International Conference on Multimedia tenutosi a Ottawa, Canada nel 29 October - 3 November 2023) [10.1145/3581783.3612121].

Multi-Domain Lifelong Visual Question Answering via Self-Critical Distillation

Lao, Mingrui;Pu, Nan;Liu, Yu;Zhong, Zhun;Bakker, Erwin M.;Sebe, Nicu;Lew, Michael S.

2023-01-01

Abstract

Visual Question Answering (VQA) has achieved significant success over the last few years, while most studies focus on training a VQA model on a stationary domain (e.g., a given dataset). In real-world application scenarios, however, these methods are often inefficient because VQA systems are always supposed to extend their knowledge and meet the ever-changing demands of users. In this paper, we introduce a new and challenging multi-domain lifelong VQA task, dubbed MDL-VQA, which encourages the VQA model to continuously learn across multiple domains while mitigating the forgetting on previously-learned domains. Furthermore, we propose a novel replay-free Self-Critical Distillation (SCD) framework tailor-made for MDL-VQA, which alleviates forgetting issue via transferring previous-domain knowledge from teacher to student models. First, we propose to introspect the teacher’s understanding over original and counterfactual samples, thereby creating informative instance-relevant and domain-relevant knowledge for logits-based distillation. Second, on the side of feature-based distillation, we propose to introspect the reasoning behavior of student model to establish the harmful domain-specific knowledge acquired in current domain, and further leverage the metric learning strategy to encourage student to learn useful knowledge in new domain. Extensive experiments demonstrate that SCD framework outperforms state-of-the-art competitors with different training orders.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2023
			
	Titolo del volume (Proceedings title)
	
				MM '23: Proceedings of the 31st ACM International Conference on Multimedia
			
	Luogo di edizione (Place of publication)
	
				New York, NY, United States
			
	Casa editrice (Publisher)
	
				Association for Computing Machinery, Inc
			
	ISBN
	
				9798400701085
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85179547979
			
	Codice WOS (WOS identifier)
	
				WOS:001199449104081
			
	Tutti gli autori
	
						Lao, Mingrui; Pu, Nan; Liu, Yu; Zhong, Zhun; Bakker, Erwin M.; Sebe, Nicu; Lew, Michael S.
					
	Citazione
	
				Multi-Domain Lifelong Visual Question Answering via Self-Critical Distillation / Lao, Mingrui; Pu, Nan; Liu, Yu; Zhong, Zhun; Bakker, Erwin M.; Sebe, Nicu; Lew, Michael S.. - (2023), pp. 4747-4758. (Intervento presentato al  convegno The 31st ACM International Conference on Multimedia tenutosi a Ottawa, Canada nel 29 October - 3 November 2023) [10.1145/3581783.3612121].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
3581783.3612121 (2)-compressed.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 473.55 kB Formato Adobe PDF Visualizza/Apri	473.55 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/398407

Citazioni

ND

3

1

ND

social impact