A pure MLP-Mixer-based GAN framework for guided image translation

Tang, Hao; Ren, Bin; Sebe, Niculae

doi:10.1016/j.patcog.2024.110894

Traditional guided image translation methods, based on encoder–decoder or U-Net structures, often struggle with complex or contrasting images. To address this, we introduce a novel dual-stage strategy. First, we use a cascaded cross-gating MLP-Mixer to merge image and semantic guidance codes, generating intermediate results influenced by these cues. Second, we implement a refined pixel-level loss function to handle semantic guidance noise, along with a new cross-attention gating mechanism for detail refinement. Additionally, our framework utilizes an MLP-Mixer-based discriminator, ensuring that the entire system is built on the MLP-Mixer architecture. Our results in cross-view image translation and person image synthesis outperform current benchmarks, demonstrating the effectiveness of our method.

A pure MLP-Mixer-based GAN framework for guided image translation / Tang, Hao; Ren, Bin; Sebe, Niculae. - In: PATTERN RECOGNITION. - ISSN 0031-3203. - 157:(2025). [10.1016/j.patcog.2024.110894]

A pure MLP-Mixer-based GAN framework for guided image translation

Tang, Hao;Ren, Bin;Sebe, Niculae

2025-01-01

Abstract

Traditional guided image translation methods, based on encoder–decoder or U-Net structures, often struggle with complex or contrasting images. To address this, we introduce a novel dual-stage strategy. First, we use a cascaded cross-gating MLP-Mixer to merge image and semantic guidance codes, generating intermediate results influenced by these cues. Second, we implement a refined pixel-level loss function to handle semantic guidance noise, along with a new cross-attention gating mechanism for detail refinement. Additionally, our framework utilizes an MLP-Mixer-based discriminator, ensuring that the entire system is built on the MLP-Mixer architecture. Our results in cross-view image translation and person image synthesis outperform current benchmarks, demonstrating the effectiveness of our method.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2025
			
	Titolo del periodico (Journal title)
	
				PATTERN RECOGNITION
			
	DOI
	
				https://dx.doi.org/10.1016/j.patcog.2024.110894
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85201487164
			
	Codice WOS (WOS identifier)
	
				WOS:001300042400001
			
	Tutti gli autori
	
						Tang, Hao; Ren, Bin; Sebe, Niculae
					
	Citazione
	
				A pure MLP-Mixer-based GAN framework for guided image translation / Tang, Hao; Ren, Bin; Sebe, Niculae. - In: PATTERN RECOGNITION. - ISSN 0031-3203. - 157:(2025). [10.1016/j.patcog.2024.110894]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
PR-MLP-MixerHao24.pdf Solo gestori archivio Descrizione: Proof PDF Tipologia: Altro materiale allegato (Other attachments) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 7.1 MB Formato Adobe PDF Visualizza/Apri	7.1 MB	Adobe PDF	Visualizza/Apri
1-s2.0-S0031320324006459-main.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 3.54 MB Formato Adobe PDF Visualizza/Apri	3.54 MB	Adobe PDF	Visualizza/Apri