Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

IRIS

To achieve disentangled image manipulation, previous works depend heavily on manual annotation. Meanwhile, the available manipulations are limited to a pre-defined set the models were trainedfor. We propose a novelframework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of ma-nipulations. Our method approaches the targets by deeply exploiting the power of the large-scale pre-trained vision-language model CLIP [32]. Concretely, we firstly Predict the possibly entangled attributes for a given text command. Then, based on the predicted attributes, we introduce an entanglement loss to Prevent entanglements during training. Finally, we propose a new evaluation metric to Evaluate the disentangled image manipulation. We verify the effectiveness of our method on the challenging face editing task. Extensive experiments show that the proposed PPE frame-work achieves much better quantitative and qualitative re-sults than the up-to-date StyleCLIP [31] baseline. Code is available at https://github.com/zipengxuc/PPE.

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model / Xu, Z.; Lin, T.; Tang, H.; Li, F.; He, D.; Sebe, N.; Timofte, R.; Van Gool, L.; Ding, E.. - 2022:(2022), pp. 18208-18217. (Intervento presentato al convegno 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 tenutosi a usa nel 2022) [10.1109/CVPR52688.2022.01769].

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

Xu Z.;Lin T.;Tang H.;Li F.;He D.;Sebe N.;Timofte R.;Van Gool L.;Ding E.

2022-01-01

Abstract

To achieve disentangled image manipulation, previous works depend heavily on manual annotation. Meanwhile, the available manipulations are limited to a pre-defined set the models were trainedfor. We propose a novelframework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of ma-nipulations. Our method approaches the targets by deeply exploiting the power of the large-scale pre-trained vision-language model CLIP [32]. Concretely, we firstly Predict the possibly entangled attributes for a given text command. Then, based on the predicted attributes, we introduce an entanglement loss to Prevent entanglements during training. Finally, we propose a new evaluation metric to Evaluate the disentangled image manipulation. We verify the effectiveness of our method on the challenging face editing task. Extensive experiments show that the proposed PPE frame-work achieves much better quantitative and qualitative re-sults than the up-to-date StyleCLIP [31] baseline. Code is available at https://github.com/zipengxuc/PPE.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
			
	Luogo di edizione (Place of publication)
	
				Piscataway, NJ USA
			
	Casa editrice (Publisher)
	
				IEEE Computer Society
			
	ISBN
	
				978-1-6654-6946-3
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85137433390
			
	Codice WOS (WOS identifier)
	
				WOS:000870783004004
			
	Tutti gli autori
	
						Xu, Z.; Lin, T.; Tang, H.; Li, F.; He, D.; Sebe, N.; Timofte, R.; Van Gool, L.; Ding, E.
					
	Citazione
	
				Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model / Xu, Z.; Lin, T.; Tang, H.; Li, F.; He, D.; Sebe, N.; Timofte, R.; Van Gool, L.; Ding, E.. - 2022:(2022), pp. 18208-18217. (Intervento presentato al  convegno 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 tenutosi a usa nel 2022) [10.1109/CVPR52688.2022.01769].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
Xu_Predict_Prevent_and_Evaluate_Disentangled_Text-Driven_Image_Manipulation_Empowered_by_CVPR_2022_paper.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 8.32 MB Formato Adobe PDF Visualizza/Apri	8.32 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/361268

Citazioni

ND

30

20

ND

social impact