To achieve disentangled image manipulation, previous works depend heavily on manual annotation. Meanwhile, the available manipulations are limited to a pre-defined set the models were trainedfor. We propose a novelframework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of ma-nipulations. Our method approaches the targets by deeply exploiting the power of the large-scale pre-trained vision-language model CLIP [32]. Concretely, we firstly Predict the possibly entangled attributes for a given text command. Then, based on the predicted attributes, we introduce an entanglement loss to Prevent entanglements during training. Finally, we propose a new evaluation metric to Evaluate the disentangled image manipulation. We verify the effectiveness of our method on the challenging face editing task. Extensive experiments show that the proposed PPE frame-work achieves much better quantitative and qualitative re-sults than the up-to-date StyleCLIP [31] baseline. Code is available at https://github.com/zipengxuc/PPE.

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model / Xu, Z.; Lin, T.; Tang, H.; Li, F.; He, D.; Sebe, N.; Timofte, R.; Van Gool, L.; Ding, E.. - 2022:(2022), pp. 18208-18217. (Intervento presentato al convegno 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 tenutosi a usa nel 2022) [10.1109/CVPR52688.2022.01769].

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

Xu Z.;Tang H.;Sebe N.;
2022-01-01

Abstract

To achieve disentangled image manipulation, previous works depend heavily on manual annotation. Meanwhile, the available manipulations are limited to a pre-defined set the models were trainedfor. We propose a novelframework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of ma-nipulations. Our method approaches the targets by deeply exploiting the power of the large-scale pre-trained vision-language model CLIP [32]. Concretely, we firstly Predict the possibly entangled attributes for a given text command. Then, based on the predicted attributes, we introduce an entanglement loss to Prevent entanglements during training. Finally, we propose a new evaluation metric to Evaluate the disentangled image manipulation. We verify the effectiveness of our method on the challenging face editing task. Extensive experiments show that the proposed PPE frame-work achieves much better quantitative and qualitative re-sults than the up-to-date StyleCLIP [31] baseline. Code is available at https://github.com/zipengxuc/PPE.
2022
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Piscataway, NJ USA
IEEE Computer Society
978-1-6654-6946-3
Xu, Z.; Lin, T.; Tang, H.; Li, F.; He, D.; Sebe, N.; Timofte, R.; Van Gool, L.; Ding, E.
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model / Xu, Z.; Lin, T.; Tang, H.; Li, F.; He, D.; Sebe, N.; Timofte, R.; Van Gool, L.; Ding, E.. - 2022:(2022), pp. 18208-18217. (Intervento presentato al convegno 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 tenutosi a usa nel 2022) [10.1109/CVPR52688.2022.01769].
File in questo prodotto:
File Dimensione Formato  
Xu_Predict_Prevent_and_Evaluate_Disentangled_Text-Driven_Image_Manipulation_Empowered_by_CVPR_2022_paper.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 8.32 MB
Formato Adobe PDF
8.32 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/361268
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 23
  • ???jsp.display-item.citation.isi??? 12
social impact