Why retrieve when you can edit: A fast conditional StyleGAN latent editing method

Radu, Andrei; Song, Yue; Neacsu, Ana; Sebe, Nicu

doi:10.1016/j.patrec.2026.02.009

Text-to-image diffusion models represent the de facto tools for image editing, but come with the disadvantage of a time-consuming multi-step approach, while also having a considerably large number of parameters. Recently, various methods have been proposed to increase the speed of the editing process, most focusing on searching the latent space of Generative Adversarial Networks (GANs) for semantically meaningful directions and synthesising the desired features from there. However, this task often requires extensive training to extract meaningful editing directions. As such, we propose a new training paradigm, related to that of a teacher-student technique, which leverages the remarkable conditional generation performances of StyleGAN for image attribute insertion via text conditioning. Our method computes the required changes of the latent style space by morphing the textual embeddings with the style space inside a Transformer architecture. We studied the editing capabilities of our approach on three benchmark datasets to demonstrate the intrinsic information acquired during the training of a conditional StyleGAN (teacher) and the transfer efficiency of information to the student network, outperforming other SOTA methods while requiring fewer resources.

Why retrieve when you can edit: A fast conditional StyleGAN latent editing method / Radu, Andrei; Song, Yue; Neacsu, Ana; Sebe, Nicu. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 202:April 2026(2026), pp. 114-119. [10.1016/j.patrec.2026.02.009]

Why retrieve when you can edit: A fast conditional StyleGAN latent editing method

Andrei Radu;Yue Song;Ana Neacsu;Nicu Sebe

2026-01-01

Abstract

Text-to-image diffusion models represent the de facto tools for image editing, but come with the disadvantage of a time-consuming multi-step approach, while also having a considerably large number of parameters. Recently, various methods have been proposed to increase the speed of the editing process, most focusing on searching the latent space of Generative Adversarial Networks (GANs) for semantically meaningful directions and synthesising the desired features from there. However, this task often requires extensive training to extract meaningful editing directions. As such, we propose a new training paradigm, related to that of a teacher-student technique, which leverages the remarkable conditional generation performances of StyleGAN for image attribute insertion via text conditioning. Our method computes the required changes of the latent style space by morphing the textual embeddings with the style space inside a Transformer architecture. We studied the editing capabilities of our approach on three benchmark datasets to demonstrate the intrinsic information acquired during the training of a conditional StyleGAN (teacher) and the transfer efficiency of information to the student network, outperforming other SOTA methods while requiring fewer resources.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2026
			
	Titolo del periodico (Journal title)
	
				PATTERN RECOGNITION LETTERS
			
	Numero e parte del fascicolo (Issue number and part)
	
				April 2026
			
	DOI
	
				https://dx.doi.org/10.1016/j.patrec.2026.02.009
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-105030412386
			
	Codice WOS (WOS identifier)
	
				WOS:001696212600001
			
	Tutti gli autori
	
						Radu, Andrei; Song, Yue; Neacsu, Ana; Sebe, Nicu
					
	Citazione
	
				Why retrieve when you can edit: A fast conditional StyleGAN latent editing method / Radu, Andrei; Song, Yue; Neacsu, Ana; Sebe, Nicu. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 202:April 2026(2026), pp. 114-119. [10.1016/j.patrec.2026.02.009]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
PRL-WhyRetrieve26.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 6.58 MB Formato Adobe PDF Visualizza/Apri	6.58 MB	Adobe PDF	Visualizza/Apri