Network pruning focuses on algorithms that aim to reduce a given model’s computational cost by removing a subset of its parameters while having minimal impact on performance. Throughout the last decade, the most widely used pruning paradigm has been pruning and re-training, which nowadays is inconvenient due to the vast amount of pre-trained models, which are, in any case, too expensive to re-train. In this paper, we exploit functional information from dense pre-trained models, i.e., their input activations, to obtain sparse models that maximize the activations’ alignment with respect to their corresponding dense models. Hence, we propose NeuronAl, a top-up algorithm that can be used on top of any given pruning algorithm for LLMs, which modifies the block-wise and row-wise sparsity, exploiting information from both the dense model and its sparse version to maximize the neuron alignment among activations. Different from existing methods, our approach adaptively selects the best hyperparameters for the block-wise and row-wise sparsity ratios w.r.t. the model and the desired sparsity, and requires no re-training. We test our method over ∼300 test cases with four LLM families, three sparsity ratios, and ten language tasks (three language modeling and seven zero-shot datasets), showing how it consistently outperforms the latest state-of-the-art methods in terms of performance-runtime trade-off. The code is available at https://github.com/eliacunegatti/NeuroAL.
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training / Cunegatti, E.; Custode, L. L.; Iacca, G.. - In: TRANSACTIONS ON MACHINE LEARNING RESEARCH. - ISSN 2835-8856. - 2025:(2025).
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Cunegatti E.;Custode L. L.;Iacca G.
2025-01-01
Abstract
Network pruning focuses on algorithms that aim to reduce a given model’s computational cost by removing a subset of its parameters while having minimal impact on performance. Throughout the last decade, the most widely used pruning paradigm has been pruning and re-training, which nowadays is inconvenient due to the vast amount of pre-trained models, which are, in any case, too expensive to re-train. In this paper, we exploit functional information from dense pre-trained models, i.e., their input activations, to obtain sparse models that maximize the activations’ alignment with respect to their corresponding dense models. Hence, we propose NeuronAl, a top-up algorithm that can be used on top of any given pruning algorithm for LLMs, which modifies the block-wise and row-wise sparsity, exploiting information from both the dense model and its sparse version to maximize the neuron alignment among activations. Different from existing methods, our approach adaptively selects the best hyperparameters for the block-wise and row-wise sparsity ratios w.r.t. the model and the desired sparsity, and requires no re-training. We test our method over ∼300 test cases with four LLM families, three sparsity ratios, and ten language tasks (three language modeling and seven zero-shot datasets), showing how it consistently outperforms the latest state-of-the-art methods in terms of performance-runtime trade-off. The code is available at https://github.com/eliacunegatti/NeuroAL.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



