Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

Huang, Yiran; Thede, Lukas; Mancini, Massimiliano; Wenjia, Xu; Akata, Zeynep

doi:10.1007/978-3-032-12840-9_21

While Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities, their substantial computational and memory requirements pose significant barriers to practical deployment. Current parameter reduction techniques primarily involve training MLLMs from Small Language Models (SLMs), but these methods offer limited flexibility and remain computationally intensive. To address this gap, we propose to directly compress existing MLLMs through structural pruning combined with efficient recovery training. Specifically, we investigate two structural pruning paradigms—layerwise and widthwise pruning—applied to the language model backbone of MLLMs, alongside supervised finetuning and knowledge distillation. Additionally, we assess the feasibility of conducting recovery training with only a small fraction of the available data. Our results show that widthwise pruning generally maintains better performance in low-resource scenarios with limited computational resources or insufficient finetuning data. As for the recovery training, finetuning only the multimodal projector is sufficient at small compression levels (<20%). Furthermore, a combination of supervised finetuning and hidden-state distillation yields optimal recovery across various pruning levels. Notably, effective recovery can be achieved with as little as 5% of the original training data, while retaining over 95% of original performance. Through empirical study on two representative MLLMs, i.e., LLaVA-v1.5-7B and Bunny-v1.0-3B, this study offers actionable insights for practitioners aiming to compress MLLMs effectively without extensive computation resources or sufficient data.

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study / Huang, Yiran; Thede, Lukas; Mancini, Massimiliano; Xu, Wenjia; Akata, Zeynep. - (2026), pp. 320-336. ( GCPR Germany 2025) [10.1007/978-3-032-12840-9_21].

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

Huang, Yiran;Thede, Lukas;Mancini, Massimiliano;Xu, Wenjia;Akata, Zeynep

2026-01-01

Abstract

While Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities, their substantial computational and memory requirements pose significant barriers to practical deployment. Current parameter reduction techniques primarily involve training MLLMs from Small Language Models (SLMs), but these methods offer limited flexibility and remain computationally intensive. To address this gap, we propose to directly compress existing MLLMs through structural pruning combined with efficient recovery training. Specifically, we investigate two structural pruning paradigms—layerwise and widthwise pruning—applied to the language model backbone of MLLMs, alongside supervised finetuning and knowledge distillation. Additionally, we assess the feasibility of conducting recovery training with only a small fraction of the available data. Our results show that widthwise pruning generally maintains better performance in low-resource scenarios with limited computational resources or insufficient finetuning data. As for the recovery training, finetuning only the multimodal projector is sufficient at small compression levels (<20%). Furthermore, a combination of supervised finetuning and hidden-state distillation yields optimal recovery across various pruning levels. Notably, effective recovery can be achieved with as little as 5% of the original training data, while retaining over 95% of original performance. Through empirical study on two representative MLLMs, i.e., LLaVA-v1.5-7B and Bunny-v1.0-3B, this study offers actionable insights for practitioners aiming to compress MLLMs effectively without extensive computation resources or sufficient data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2026
			
	Titolo del volume (Proceedings title)
	
				DAGM German Conference on Pattern Recognition
			
	Luogo di edizione (Place of publication)
	
				Heidelberg, Germany
			
	Casa editrice (Publisher)
	
				Springer
			
	ISBN
	
				9783032128393
9783032128409
			
	Tutti gli autori
	
						Huang, Yiran; Thede, Lukas; Mancini, Massimiliano; Xu, Wenjia; Akata, Zeynep
					
	Citazione
	
				Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study / Huang, Yiran; Thede, Lukas; Mancini, Massimiliano; Xu, Wenjia; Akata, Zeynep. - (2026), pp. 320-336. ( GCPR Germany 2025) [10.1007/978-3-032-12840-9_21].

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/472170

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

0

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

Huang, Yiran;Thede, Lukas;Mancini, Massimiliano;Xu, Wenjia;Akata, Zeynep

2026-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)