PMGNet: Disentanglement and entanglement benefit mutually for compositional zero-shot learning

Liu, Yu; Jianghao, Li; Zhang, Yanyi; Jia, Qi; Wang, Weimin; Pu, Nan; Sebe, Nicu

doi:10.1016/j.cviu.2024.104197

Compositional zero-shot learning (CZSL) aims to model compositions of two primitives (i.e., attributes and objects) to classify unseen attribute-object pairs. Most studies are devoted to integrating disentanglement and entanglement strategies to circumvent the trade-off between contextuality and generalizability. Indeed, the two strategies can mutually benefit when used together. Nevertheless, they neglect the significance of developing mutual guidance between the two strategies. In this work, we take full advantage of guidance from disentanglement to entanglement and vice versa. Additionally, we propose exploring multi-scale feature learning to achieve fine-grained mutual guidance in a progressive framework. Our approach, termed Progressive Mutual Guidance Network (PMGNet), unifies disentanglement–entanglement representation learning, allowing them to learn from and teach each other progressively in one unified model. Furthermore, to alleviate overfitting recognition on seen pairs, we adopt a relaxed cross-entropy loss to train PMGNet, without an increase of time and memory cost. Extensive experiments on three benchmarks demonstrate that our method achieves distinct improvements, reaching state-of-the-art performance. Moreover, PMGNet exhibits promising performance under the most challenging open-world CZSL setting, especially for unseen pairs.

PMGNet: Disentanglement and entanglement benefit mutually for compositional zero-shot learning / Liu, Yu; Li, Jianghao; Zhang, Yanyi; Jia, Qi; Wang, Weimin; Pu, Nan; Sebe, Nicu. - In: COMPUTER VISION AND IMAGE UNDERSTANDING. - ISSN 1077-3142. - 249:(2024), pp. 10419701-10419711. [10.1016/j.cviu.2024.104197]

PMGNet: Disentanglement and entanglement benefit mutually for compositional zero-shot learning

Liu, Yu;Li, Jianghao;Zhang, Yanyi;Jia, Qi;Wang, Weimin;Pu, Nan;Sebe, Nicu

2024-01-01

Abstract

Compositional zero-shot learning (CZSL) aims to model compositions of two primitives (i.e., attributes and objects) to classify unseen attribute-object pairs. Most studies are devoted to integrating disentanglement and entanglement strategies to circumvent the trade-off between contextuality and generalizability. Indeed, the two strategies can mutually benefit when used together. Nevertheless, they neglect the significance of developing mutual guidance between the two strategies. In this work, we take full advantage of guidance from disentanglement to entanglement and vice versa. Additionally, we propose exploring multi-scale feature learning to achieve fine-grained mutual guidance in a progressive framework. Our approach, termed Progressive Mutual Guidance Network (PMGNet), unifies disentanglement–entanglement representation learning, allowing them to learn from and teach each other progressively in one unified model. Furthermore, to alleviate overfitting recognition on seen pairs, we adopt a relaxed cross-entropy loss to train PMGNet, without an increase of time and memory cost. Extensive experiments on three benchmarks demonstrate that our method achieves distinct improvements, reaching state-of-the-art performance. Moreover, PMGNet exhibits promising performance under the most challenging open-world CZSL setting, especially for unseen pairs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del periodico (Journal title)
	
				COMPUTER VISION AND IMAGE UNDERSTANDING
			
	DOI
	
				https://dx.doi.org/10.1016/j.cviu.2024.104197
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85206440834
			
	Codice WOS (WOS identifier)
	
				WOS:001340748800001
			
	Tutti gli autori
	
						Liu, Yu; Li, Jianghao; Zhang, Yanyi; Jia, Qi; Wang, Weimin; Pu, Nan; Sebe, Nicu
					
	Citazione
	
				PMGNet: Disentanglement and entanglement benefit mutually for compositional zero-shot learning / Liu, Yu; Li, Jianghao; Zhang, Yanyi; Jia, Qi; Wang, Weimin; Pu, Nan; Sebe, Nicu. - In: COMPUTER VISION AND IMAGE UNDERSTANDING. - ISSN 1077-3142. - 249:(2024), pp. 10419701-10419711. [10.1016/j.cviu.2024.104197]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S1077314224002789-main (1).pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.11 MB Formato Adobe PDF Visualizza/Apri	2.11 MB	Adobe PDF	Visualizza/Apri