Text-to-image diffusion models represent the de facto tools for image editing, but come with the disadvantage of a time-consuming multi-step approach, while also having a considerably large number of parameters. Recently, various methods have been proposed to increase the speed of the editing process, most focusing on searching the latent space of Generative Adversarial Networks (GANs) for semantically meaningful directions and synthesising the desired features from there. However, this task often requires extensive training to extract meaningful editing directions. As such, we propose a new training paradigm, related to that of a teacher-student technique, which leverages the remarkable conditional generation performances of StyleGAN for image attribute insertion via text conditioning. Our method computes the required changes of the latent style space by morphing the textual embeddings with the style space inside a Transformer architecture. We studied the editing capabilities of our approach on three benchmark datasets to demonstrate the intrinsic information acquired during the training of a conditional StyleGAN (teacher) and the transfer efficiency of information to the student network, outperforming other SOTA methods while requiring fewer resources.
Why retrieve when you can edit: A fast conditional StyleGAN latent editing method / Radu, Andrei; Song, Yue; Neacsu, Ana; Sebe, Nicu. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 202:April 2026(2026), pp. 114-119. [10.1016/j.patrec.2026.02.009]
Why retrieve when you can edit: A fast conditional StyleGAN latent editing method
Andrei Radu;Yue Song;Nicu Sebe
2026-01-01
Abstract
Text-to-image diffusion models represent the de facto tools for image editing, but come with the disadvantage of a time-consuming multi-step approach, while also having a considerably large number of parameters. Recently, various methods have been proposed to increase the speed of the editing process, most focusing on searching the latent space of Generative Adversarial Networks (GANs) for semantically meaningful directions and synthesising the desired features from there. However, this task often requires extensive training to extract meaningful editing directions. As such, we propose a new training paradigm, related to that of a teacher-student technique, which leverages the remarkable conditional generation performances of StyleGAN for image attribute insertion via text conditioning. Our method computes the required changes of the latent style space by morphing the textual embeddings with the style space inside a Transformer architecture. We studied the editing capabilities of our approach on three benchmark datasets to demonstrate the intrinsic information acquired during the training of a conditional StyleGAN (teacher) and the transfer efficiency of information to the student network, outperforming other SOTA methods while requiring fewer resources.| File | Dimensione | Formato | |
|---|---|---|---|
|
PRL-WhyRetrieve26.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
6.58 MB
Formato
Adobe PDF
|
6.58 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



