Text-to-image diffusion models represent the de facto tools for image editing, but come with the disadvantage of a time-consuming multi-step approach, while also having a considerably large number of parameters. Recently, various methods have been proposed to increase the speed of the editing process, most focusing on searching the latent space of Generative Adversarial Networks (GANs) for semantically meaningful directions and synthesising the desired features from there. However, this task often requires extensive training to extract meaningful editing directions. As such, we propose a new training paradigm, related to that of a teacher-student technique, which leverages the remarkable conditional generation performances of StyleGAN for image attribute insertion via text conditioning. Our method computes the required changes of the latent style space by morphing the textual embeddings with the style space inside a Transformer architecture. We studied the editing capabilities of our approach on three benchmark datasets to demonstrate the intrinsic information acquired during the training of a conditional StyleGAN (teacher) and the transfer efficiency of information to the student network, outperforming other SOTA methods while requiring fewer resources.

Why retrieve when you can edit: A fast conditional StyleGAN latent editing method / Radu, Andrei; Song, Yue; Neacsu, Ana; Sebe, Nicu. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 202:April 2026(2026), pp. 114-119. [10.1016/j.patrec.2026.02.009]

Why retrieve when you can edit: A fast conditional StyleGAN latent editing method

Andrei Radu;Yue Song;Nicu Sebe
2026-01-01

Abstract

Text-to-image diffusion models represent the de facto tools for image editing, but come with the disadvantage of a time-consuming multi-step approach, while also having a considerably large number of parameters. Recently, various methods have been proposed to increase the speed of the editing process, most focusing on searching the latent space of Generative Adversarial Networks (GANs) for semantically meaningful directions and synthesising the desired features from there. However, this task often requires extensive training to extract meaningful editing directions. As such, we propose a new training paradigm, related to that of a teacher-student technique, which leverages the remarkable conditional generation performances of StyleGAN for image attribute insertion via text conditioning. Our method computes the required changes of the latent style space by morphing the textual embeddings with the style space inside a Transformer architecture. We studied the editing capabilities of our approach on three benchmark datasets to demonstrate the intrinsic information acquired during the training of a conditional StyleGAN (teacher) and the transfer efficiency of information to the student network, outperforming other SOTA methods while requiring fewer resources.
2026
April 2026
Radu, Andrei; Song, Yue; Neacsu, Ana; Sebe, Nicu
Why retrieve when you can edit: A fast conditional StyleGAN latent editing method / Radu, Andrei; Song, Yue; Neacsu, Ana; Sebe, Nicu. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 202:April 2026(2026), pp. 114-119. [10.1016/j.patrec.2026.02.009]
File in questo prodotto:
File Dimensione Formato  
PRL-WhyRetrieve26.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 6.58 MB
Formato Adobe PDF
6.58 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/478131
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact