FedVQA: Personalized Federated Visual Question Answering over Heterogeneous Scenes

Lao, Mingrui; Pu, Nan; Zhong, Zhun; Sebe, Nicu; Lew, Michael S.

doi:10.1145/3581783.3611958

This paper presents a new setting for visual question answering (VQA) called personalized federated VQA (FedVQA) that addresses the growing need for decentralization and data privacy protection. FedVQA is both practical and challenging, requiring clients to learn well-personalized models on scene-specific datasets with severe feature/label distribution skews. These models then collaborate to optimize a generic global model on a central server, which is desired to generalize well on both seen and unseen scenes without sharing raw data with the server and other clients. The primary challenge of FedVQA is that, client models tend to forget the global knowledge initialized from central server during the personalized training, which impairs their personalized capacity due to the potential overfitting issue on local data. This further leads to divergence issues when aggregating distinct personalized knowledge at the central server, resulting in an inferior generalization ability on unseen scenes. To address the challenge, we propose a novel federated pairwise preference preserving (FedP3 ) framework to improve personalized learning via preserving generic knowledge under FedVQA constraints. Specifically, we first design a differentiable pairwise preference (DPP) to improve knowledge preserving by formulating a flexible yet effective global knowledge. Then, we introduce a forgotten-knowledge filter (FKF) to encourage the client models to selectively consolidate easily-forgotten knowledge. By aggregating the DPP and the FKF, FedP3 coordinates the generic and the personalized knowledge to enhance the personalized ability of clients and generalizability of the server. Extensive experiments show that FedP3 consistently surpasses the state-of-the-art in FedVQA task.

FedVQA: Personalized Federated Visual Question Answering over Heterogeneous Scenes / Lao, M., Pu, N., Zhong, Z., Sebe, N., Lew, M.S.. - (2023), pp. 7796-7807. (31st ACM International Conference on Multimedia, MM 2023 Ottawa, Canada October 2023) [10.1145/3581783.3611958].

FedVQA: Personalized Federated Visual Question Answering over Heterogeneous Scenes

Lao, Mingrui;Pu, Nan;Zhong, Zhun;Sebe, Nicu;Lew, Michael S.

2023-01-01

Abstract

This paper presents a new setting for visual question answering (VQA) called personalized federated VQA (FedVQA) that addresses the growing need for decentralization and data privacy protection. FedVQA is both practical and challenging, requiring clients to learn well-personalized models on scene-specific datasets with severe feature/label distribution skews. These models then collaborate to optimize a generic global model on a central server, which is desired to generalize well on both seen and unseen scenes without sharing raw data with the server and other clients. The primary challenge of FedVQA is that, client models tend to forget the global knowledge initialized from central server during the personalized training, which impairs their personalized capacity due to the potential overfitting issue on local data. This further leads to divergence issues when aggregating distinct personalized knowledge at the central server, resulting in an inferior generalization ability on unseen scenes. To address the challenge, we propose a novel federated pairwise preference preserving (FedP3 ) framework to improve personalized learning via preserving generic knowledge under FedVQA constraints. Specifically, we first design a differentiable pairwise preference (DPP) to improve knowledge preserving by formulating a flexible yet effective global knowledge. Then, we introduce a forgotten-knowledge filter (FKF) to encourage the client models to selectively consolidate easily-forgotten knowledge. By aggregating the DPP and the FKF, FedP3 coordinates the generic and the personalized knowledge to enhance the personalized ability of clients and generalizability of the server. Extensive experiments show that FedP3 consistently surpasses the state-of-the-art in FedVQA task.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2023
			
	Titolo del volume (Proceedings title)
	
				MM '23: Proceedings of the 31st ACM International Conference on Multimedia
			
	Luogo di edizione (Place of publication)
	
				New York, NY, United States
			
	Casa editrice (Publisher)
	
				Association for Computing Machinery, Inc
			
	ISBN
	
				9798400701085
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85179559348
			
	Codice WOS (WOS identifier)
	
				WOS:001199449107078
			
	Tutti gli autori
	
						Lao, Mingrui; Pu, Nan; Zhong, Zhun; Sebe, Nicu; Lew, Michael S.
					
	Citazione
	
				FedVQA: Personalized Federated Visual Question Answering over Heterogeneous Scenes / Lao, M., Pu, N., Zhong, Z., Sebe, N., Lew, M.S.. - (2023), pp. 7796-7807. (31st ACM International Conference on Multimedia, MM 2023 Ottawa, Canada October 2023) [10.1145/3581783.3611958].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
3581783.3611958 (1)-compressed.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.53 MB Formato Adobe PDF Visualizza/Apri	1.53 MB	Adobe PDF	Visualizza/Apri