While excellent in transfer learning, Vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue, removing parameters via model pruning is a viable solution. However, existing techniques for VLMs are task-specific, and thus require pruning the network from scratch for each new task of interest. In this work, we explore a new direction: Task-Agnostic Vision-Language Pruning (TA-VLP). Given a pretrained VLM, the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting, the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus, we propose Multimodal Flow Pruning (MULTIFLOW), a first, gradient-free, pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow, by incorporating the saliency of the neurons it connects; and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP, experimenting with two VLMs, three vision-language tasks, and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated, combinatorial competitors in the vast majority of the cases, paving the way towards addressing TA- VLP. The code is publicly available at https://github.com/FarinaMatteo/multiflow.

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning / Farina, Matteo; Mancini, Massimiliano; Cunegatti, Elia; Iacca, Giovanni; Ricci, Elisa. - (2024), pp. 16185-16195. (Intervento presentato al convegno CVPR tenutosi a Seattle, WA, USA nel 17th June 2024–21st June 2024) [10.1109/cvpr52733.2024.01532].

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

Farina, Matteo;Mancini, Massimiliano;Cunegatti, Elia;Iacca, Giovanni;Ricci, Elisa
2024-01-01

Abstract

While excellent in transfer learning, Vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue, removing parameters via model pruning is a viable solution. However, existing techniques for VLMs are task-specific, and thus require pruning the network from scratch for each new task of interest. In this work, we explore a new direction: Task-Agnostic Vision-Language Pruning (TA-VLP). Given a pretrained VLM, the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting, the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus, we propose Multimodal Flow Pruning (MULTIFLOW), a first, gradient-free, pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow, by incorporating the saliency of the neurons it connects; and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP, experimenting with two VLMs, three vision-language tasks, and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated, combinatorial competitors in the vast majority of the cases, paving the way towards addressing TA- VLP. The code is publicly available at https://github.com/FarinaMatteo/multiflow.
2024
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
New York City, NY, USA
IEEE
Farina, Matteo; Mancini, Massimiliano; Cunegatti, Elia; Iacca, Giovanni; Ricci, Elisa
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning / Farina, Matteo; Mancini, Massimiliano; Cunegatti, Elia; Iacca, Giovanni; Ricci, Elisa. - (2024), pp. 16185-16195. (Intervento presentato al convegno CVPR tenutosi a Seattle, WA, USA nel 17th June 2024–21st June 2024) [10.1109/cvpr52733.2024.01532].
File in questo prodotto:
File Dimensione Formato  
Farina_MULTIFLOW_Shifting_Towards_Task-Agnostic_Vision-Language_Pruning_CVPR_2024_paper.pdf

accesso aperto

Tipologia: Post-print referato (Refereed author’s manuscript)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 4.01 MB
Formato Adobe PDF
4.01 MB Adobe PDF Visualizza/Apri
MULTIFLOW_Shifting_Towards_Task-Agnostic_Vision-Language_Pruning.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 3.56 MB
Formato Adobe PDF
3.56 MB Adobe PDF   Visualizza/Apri
2404.05621v1.pdf

accesso aperto

Tipologia: Pre-print non referato (Non-refereed preprint)
Licenza: Creative commons
Dimensione 3.71 MB
Formato Adobe PDF
3.71 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/437110
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact