While diffusion-based methods have shown impressive capabilities in capturing diverse and complex hairstyles, their ability to generate consistent and high-quality multi-view outputs - crucial for real-world applications such as digital humans and virtual avatars - remains underexplored. In this paper, we propose Stable-Hair v2, a novel diffusion-based multi-view hair transfer framework. To the best of our knowledge, this is the first work to leverage multiple-view diffusion models for robust, high-fidelity, and view-consistent hair transfer across multiple perspectives. We introduce a comprehensive multi-view training data generation pipeline to generate high-quality triplet data, including bald images, reference hairstyles, and view-aligned source-bald pairs. Our multi-view hair transfer model integrates polar-azimuth embeddings for pose conditioning and temporal attention layers to ensure smooth transitions between views. To optimize this model, we design a novel multi-stage training strategy consisting of Pose-Controllable Latent IdentityNet training, Hair Extractor training, and Temporal Attention training. Extensive experiments demonstrate that our method accurately transfers detailed and realistic hairstyles to source subjects while achieving seamless and consistent results across views, significantly outperforming existing methods and establishing a new benchmark in multi-view hair transfer.

While diffusion-based methods have shown impressive capabilities in capturing diverse and complex hairstyles, their ability to generate consistent and high-quality multi-view outputs - crucial for real-world applications such as digital humans and virtual avatars - remains underexplored. In this paper, we propose Stable-Hair v2, a novel diffusion-based multi-view hair transfer framework. To the best of our knowledge, this is the first work to leverage multiple-view diffusion models for robust, high-fidelity, and view-consistent hair transfer across multiple perspectives. We introduce a comprehensive multi-view training data generation pipeline to generate high-quality triplet data, including bald images, reference hairstyles, and view-aligned source-bald pairs. Our multi-view hair transfer model integrates polar-azimuth embeddings for pose conditioning and temporal attention layers to ensure smooth transitions between views. To optimize this model, we design a novel multi-stage training strategy consisting of Pose-Controllable Latent IdentityNet training, Hair Extractor training, and Temporal Attention training. Extensive experiments demonstrate that our method accurately transfers detailed and realistic hairstyles to source subjects while achieving seamless and consistent results across views, significantly outperforming existing methods and establishing a new benchmark in multi-view hair transfer.

Stable-Hair V2: Real-World Hair Transfer via Multiple-View Diffusion Model / Sun, K.; Zhang, Y.; Zhang, J.; Liu, J.; Wang, W.; Sebe, N.; Zhao, Y.. - In: IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS. - ISSN 1077-2626. - 32:4(2026), pp. 2986-3001. [10.1109/TVCG.2026.3659861]

Stable-Hair V2: Real-World Hair Transfer via Multiple-View Diffusion Model

Zhang J.;Wang W.;Sebe N.;
2026-01-01

Abstract

While diffusion-based methods have shown impressive capabilities in capturing diverse and complex hairstyles, their ability to generate consistent and high-quality multi-view outputs - crucial for real-world applications such as digital humans and virtual avatars - remains underexplored. In this paper, we propose Stable-Hair v2, a novel diffusion-based multi-view hair transfer framework. To the best of our knowledge, this is the first work to leverage multiple-view diffusion models for robust, high-fidelity, and view-consistent hair transfer across multiple perspectives. We introduce a comprehensive multi-view training data generation pipeline to generate high-quality triplet data, including bald images, reference hairstyles, and view-aligned source-bald pairs. Our multi-view hair transfer model integrates polar-azimuth embeddings for pose conditioning and temporal attention layers to ensure smooth transitions between views. To optimize this model, we design a novel multi-stage training strategy consisting of Pose-Controllable Latent IdentityNet training, Hair Extractor training, and Temporal Attention training. Extensive experiments demonstrate that our method accurately transfers detailed and realistic hairstyles to source subjects while achieving seamless and consistent results across views, significantly outperforming existing methods and establishing a new benchmark in multi-view hair transfer.
2026
4
Sun, K.; Zhang, Y.; Zhang, J.; Liu, J.; Wang, W.; Sebe, N.; Zhao, Y.
Stable-Hair V2: Real-World Hair Transfer via Multiple-View Diffusion Model / Sun, K.; Zhang, Y.; Zhang, J.; Liu, J.; Wang, W.; Sebe, N.; Zhao, Y.. - In: IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS. - ISSN 1077-2626. - 32:4(2026), pp. 2986-3001. [10.1109/TVCG.2026.3659861]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/486531
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact