TD-CD-MPPI: Temporal-Difference Constraint-Discounted Model Predictive Path Integral Control

IRIS

Path Integral methods have demonstrated remarkable capabilities for solving non-linear stochastic optimal control problems through sampling-based optimization. However, their computational complexity grows linearly with the prediction horizon, limiting long-term reasoning, while constraints are merely enforced through handcrafted penalties. In this work, we propose a unified and efficient framework for enabling long-horizon reasoning and constraint enforcement within Model Predictive Path Integral (MPPI) control. First, we introduce a practical method to incorporate a terminal value function, learned offline via temporal-difference learning, to approximate the long-term cost-to-go. This allows for significantly shorter roll-outs while enabling infinite-horizon reasoning, thereby improving computational efficiency and motion performance. Second, we propose a discount modulation strategy that adjusts the return of sampled trajectories based on constraint violations. This provides a more interpretable and effective mechanism for enforcing constraints compared to traditional cost shaping. Our formulation retains the flexibility and sampling efficiency of MPPI while supporting structured integration of long-term objectives and constraint handling. We validate our approach on both simulated and real-world robotic locomotion tasks, demonstrating improved performance, constraint-awareness, and generalization under reduced computational budgets.

TD-CD-MPPI: Temporal-Difference Constraint-Discounted Model Predictive Path Integral Control / Crestaz, P. N.; De Matteis, L.; Chane-Sane, E.; Mansard, N.; Del Prete, A.. - In: IEEE ROBOTICS AND AUTOMATION LETTERS. - ISSN 2377-3766. - 2025, 11:1(2026), pp. 498-505. [10.1109/LRA.2025.3632612]

TD-CD-MPPI: Temporal-Difference Constraint-Discounted Model Predictive Path Integral Control

Crestaz P. N.^Primo;De Matteis L.^Secondo;Chane-Sane E.;Mansard N.^Penultimo;Del Prete A.^Ultimo

2026-01-01

Abstract

Path Integral methods have demonstrated remarkable capabilities for solving non-linear stochastic optimal control problems through sampling-based optimization. However, their computational complexity grows linearly with the prediction horizon, limiting long-term reasoning, while constraints are merely enforced through handcrafted penalties. In this work, we propose a unified and efficient framework for enabling long-horizon reasoning and constraint enforcement within Model Predictive Path Integral (MPPI) control. First, we introduce a practical method to incorporate a terminal value function, learned offline via temporal-difference learning, to approximate the long-term cost-to-go. This allows for significantly shorter roll-outs while enabling infinite-horizon reasoning, thereby improving computational efficiency and motion performance. Second, we propose a discount modulation strategy that adjusts the return of sampled trajectories based on constraint violations. This provides a more interpretable and effective mechanism for enforcing constraints compared to traditional cost shaping. Our formulation retains the flexibility and sampling efficiency of MPPI while supporting structured integration of long-term objectives and constraint handling. We validate our approach on both simulated and real-world robotic locomotion tasks, demonstrating improved performance, constraint-awareness, and generalization under reduced computational budgets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2026
			
	Titolo del periodico (Journal title)
	
				IEEE ROBOTICS AND AUTOMATION LETTERS
			
	Numero e parte del fascicolo (Issue number and part)
	
				1
			
	DOI
	
				https://dx.doi.org/10.1109/LRA.2025.3632612
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-105021529761
			
	Codice WOS (WOS identifier)
	
				WOS:001626766400021
			
	Tutti gli autori
	
						Crestaz, P. N.; De Matteis, L.; Chane-Sane, E.; Mansard, N.; Del Prete, A.
					
	Citazione
	
				TD-CD-MPPI: Temporal-Difference Constraint-Discounted Model Predictive Path Integral Control / Crestaz, P. N.; De Matteis, L.; Chane-Sane, E.; Mansard, N.; Del Prete, A.. - In: IEEE ROBOTICS AND AUTOMATION LETTERS. - ISSN 2377-3766. - 2025, 11:1(2026), pp. 498-505. [10.1109/LRA.2025.3632612]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
TD_CD_MPPI_v2.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 8.54 MB Formato Adobe PDF Visualizza/Apri	8.54 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/470791

Citazioni

ND

1

1

1

social impact