Multi-focal Conditioned Latent Diffusion for Person Image Synthesis

IRIS

The Latent Diffusion Model (LDM) has demonstrated strong capabilities in high-resolution image generation and has been widely employed for Pose-Guided Person Image Synthesis (PGPIS), yielding promising results. However, the compression process of LDM often results in the deterioration of details, particularly in sensitive areas such as facial features and clothing textures. In this paper, we propose a Multi-focal Conditioned Latent Diffusion (MCLD) method to address these limitations by conditioning the model on disentangled, pose-invariant features from these sensitive regions. Our approach utilizes a multi-focal condition aggregation module, which effectively integrates facial identity and texture-specific information, enhancing the model’s ability to produce appearance realistic and identity-consistent images. Our method demonstrates consistent identity and appearance generation on the Deep-Fashion dataset and enables flexible person image editing due to its generation consistency. The code is available at https://github.com/jqliu09/mcld.

Multi-focal Conditioned Latent Diffusion for Person Image Synthesis / Liu, Jiaqi; Zhang, Jichao; Rota, Paolo; Sebe, Nicu. - (2025), pp. 16019-16028. ( CVPR Nashville, USA June 2025) [10.1109/cvpr52734.2025.01493].

Multi-focal Conditioned Latent Diffusion for Person Image Synthesis

Liu, Jiaqi;Zhang, Jichao;Rota, Paolo;Sebe, Nicu

2025-01-01

Abstract

The Latent Diffusion Model (LDM) has demonstrated strong capabilities in high-resolution image generation and has been widely employed for Pose-Guided Person Image Synthesis (PGPIS), yielding promising results. However, the compression process of LDM often results in the deterioration of details, particularly in sensitive areas such as facial features and clothing textures. In this paper, we propose a Multi-focal Conditioned Latent Diffusion (MCLD) method to address these limitations by conditioning the model on disentangled, pose-invariant features from these sensitive regions. Our approach utilizes a multi-focal condition aggregation module, which effectively integrates facial identity and texture-specific information, enhancing the model’s ability to produce appearance realistic and identity-consistent images. Our method demonstrates consistent identity and appearance generation on the Deep-Fashion dataset and enables flexible person image editing due to its generation consistency. The code is available at https://github.com/jqliu09/mcld.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2025
			
	Titolo del volume (Proceedings title)
	
				2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
			
	Luogo di edizione (Place of publication)
	
				New York
			
	Casa editrice (Publisher)
	
				IEEE
			
	ISBN
	
				979-8-3315-4364-8
979-8-3315-4365-5
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-105017039662
			
	Tutti gli autori
	
						Liu, Jiaqi; Zhang, Jichao; Rota, Paolo; Sebe, Nicu
					
	Citazione
	
				Multi-focal Conditioned Latent Diffusion for Person Image Synthesis / Liu, Jiaqi; Zhang, Jichao; Rota, Paolo; Sebe, Nicu. - (2025), pp. 16019-16028. ( CVPR Nashville, USA June 2025) [10.1109/cvpr52734.2025.01493].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
Liu_Multi-focal_Conditioned_Latent_Diffusion_for_Person_Image_Synthesis_CVPR_2025_paper.pdf accesso aperto Tipologia: Post-print referato (Refereed author’s manuscript) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 6.2 MB Formato Adobe PDF Visualizza/Apri	6.2 MB	Adobe PDF	Visualizza/Apri
Multi-focal_Conditioned_Latent_Diffusion_for_Person_Image_Synthesis.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 5.94 MB Formato Adobe PDF Visualizza/Apri	5.94 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/462250

Citazioni

ND

0

ND

ND

social impact