Curriculum Direct Preference Optimization for Diffusion and Consistency Models

IRIS

Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). In this paper, we propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation. Our method is divided into two training stages. First, a ranking of the examples generated for each prompt is obtained by employing a reward model. Then, increasingly difficult pairs of examples are sampled and provided to a text-to-image generative (diffusion or consistency) model. Generated samples that are far apart in the ranking are considered to form easy pairs, while those that are close in the ranking form hard pairs. In other words, we use the rank difference between samples as a measure of difficulty. The sampled pairs are split into batches according to their difficulty levels, which are gradually used to train the generative model. Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on nine benchmarks, outperforming the competing methods in terms of text alignment, aesthetics and human preference. Our code is available at https://github.com/CroitoruAlin/Curriculum-DPO.

Curriculum Direct Preference Optimization for Diffusion and Consistency Models / Croitoru, F., Hondru, V., Ionescu, R.T., Sebe, N., Shah, M.. - (2025), pp. 2824-2834. (CVPR Nashville, USA June 2025) [10.1109/cvpr52734.2025.00269].

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Croitoru, Florinel-Alin;Hondru, Vlad;Ionescu, Radu Tudor;Sebe, Nicu;Shah, Mubarak

2025-01-01

Abstract

Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). In this paper, we propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation. Our method is divided into two training stages. First, a ranking of the examples generated for each prompt is obtained by employing a reward model. Then, increasingly difficult pairs of examples are sampled and provided to a text-to-image generative (diffusion or consistency) model. Generated samples that are far apart in the ranking are considered to form easy pairs, while those that are close in the ranking form hard pairs. In other words, we use the rank difference between samples as a measure of difficulty. The sampled pairs are split into batches according to their difficulty levels, which are gradually used to train the generative model. Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on nine benchmarks, outperforming the competing methods in terms of text alignment, aesthetics and human preference. Our code is available at https://github.com/CroitoruAlin/Curriculum-DPO.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2025
			
	Titolo del volume (Proceedings title)
	
				2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
			
	Luogo di edizione (Place of publication)
	
				New York
			
	Casa editrice (Publisher)
	
				IEEE COMPUTER SOC
			
	ISBN
	
				979-8-3315-4364-8
			
	Codice WOS (WOS identifier)
	
				WOS:001562507803023
			
	Tutti gli autori
	
						Croitoru, Florinel-Alin; Hondru, Vlad; Ionescu, Radu Tudor; Sebe, Nicu; Shah, Mubarak
					
	Citazione
	
				Curriculum Direct Preference Optimization for Diffusion and Consistency Models / Croitoru, F., Hondru, V., Ionescu, R.T., Sebe, N., Shah, M.. - (2025), pp. 2824-2834. (CVPR Nashville, USA June 2025) [10.1109/cvpr52734.2025.00269].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
Croitoru_Curriculum_Direct_Preference_Optimization_for_Diffusion_and_Consistency_Models_CVPR_2025_paper.pdf accesso aperto Tipologia: Post-print referato (Refereed author’s manuscript) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 9.75 MB Formato Adobe PDF Visualizza/Apri	9.75 MB	Adobe PDF	Visualizza/Apri
Curriculum_Direct_Preference_Optimization_for_Diffusion_and_Consistency_Models.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 9.25 MB Formato Adobe PDF Visualizza/Apri	9.25 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/462251

Citazioni

ND

ND

2

5

social impact