[ACL 2025] ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs
The official repository containing the introduction and code for our ACL 2025 paper: ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs.
| π₯ News | π ImPart | β‘οΈ Quick Start | π Citation |
- May 2025: Our paper has been accepted by ACL 2025 main conference.
- Apr 2025: We released our paper on arxiv.
- Motivated by the observation that singular vectors with larger singular values encode more important task-specific information.
- ImPart assigns variable sparsity ratios to singular vectors based on their corresponding singular values.
Install all the packages from requirments.txt
conda create -n impart python=3.10 -y
conda activate impart
git clone https://github.com/sustech-nlp/ImPart.git
cd ImPart
pip install -r requirements.txt
Task | Fine-tuned | Backbone | Benchmark | Benchmark |
---|---|---|---|---|
Math | WizardMath-13B-V1.0 | LLaMA-2-13B | GSM8K | MATH |
Code | WizardCoder-13B | CodeLlama-13B | HumanEval | MBPP |
Chat | LLaMA-2-13B-Chat | LLaMA-2-13B | IFEval | AlpacaEval |
Chat | LLaMA-2-7B-Chat | LLaMA-2-7B | IFEval | AlpacaEval |
Chat | LLaMA-3-8B-Instruct | LLaMA-3-8B | IFEval | AlpacaEval |
- Compute svd of
$\Delta$ weight between base model and finetuned model.
python delta.py \
--svd \
--base_model "meta-llama/Llama-2-13b-hf" \
--finetuned_model "vanillaOVO/WizardMath-13B-V1.0" \
--dim 5120 \
--save_path "delta_weight_save_path.pt"
python sparsify/sparsify.py \
--config sparsify/config_example.yaml
- ImPart outperforms baselines across most tasks and backbones, achieving the highest average score.
-
ImPart achieves more than
$2 \times$ higher compression efficiency across 8 to 64 compression ratios.
- Following the implemention in DARE and Delta-CoMe
# GSM8K
bash eval/scripts/gsm8k.sh "fine-tuned model name or path" "fp16"
# MATH
bash eval/scripts/math.sh "fine-tuned model name or path" "fp16"
# HumanEval
bash eval/scripts/humaneval.sh "fine-tuned model name or path" "fp16"
# MBPP
bash eval/scripts/mbpp.sh "fine-tuned model name or path" "fp16"
# IFEval
bash eval/scripts/ifeval.sh "fine-tuned model name or path" "fp16"
# AlpacaEval
bash eval/scripts/alpacaeval.sh "fine-tuned model name or path" "model template"
- Following Delta-CoMe, ImPart-Qt applies 8-3-2 bits mix precision quantization to
$\Delta W$ 's sparse singular vector, detailed in Section 7.1 and Appendix B.1. - Extend GPTQ to accommodate sparse weight matrix as the following algorithm (Algorithm 2 in paper).
- The code is modified based on the implementation of Delta-CoMe.
- Get
$\Delta W$ 's sparse singular vector for quantization. Ensure the total compression ratio match the target.
python sparsify/sparsify_quant.py \
--config sparsify/config_example.yaml
- Quantize the
$\Delta W$
python quantize/sparse_llama.py \
"fine-tuned model name or path" \
"c4" \
--config "quantize/13b_config_example.yaml" \
--saved_delta_path "saving path of sparse delta weight" \
--save_compressed_delta_dir "path to save the quantized delta weight"
- Reload quantized
$\Delta W$ to pretrained model
python delta.py \
--merge \
--finetuned_model "fine-tuned model name or path" \
--delta_path "path to save the quantized delta weight" \
--save_path "path to save the reconstructed model"
- ImPart-Qt achieves nearly lossless performance in the Compression Ratio (CR) of 32.
- The
$\Delta W$ processed by ImPart can be used to improve the performance of model merging. - Following DARE, we apply ImPart to two classic model merging method: Task Arithmetic and TIES-Merging
- Task Arithmetic: use "merge_method" like "ta_n", where n is the scaling term.
- TIES-Merging: use "merge_method" like "ties_t_n", where t is the trim ratio and n is the scaling term.
python merge/merge.py \
--merge_method "ta_0.5" \
--ptm_pth "pretrained model for math" \
--math_pth "ImPart-processed fine-tuned model for math" \
--code_pth "ImPart-processed fine-tuned model for code" \
--chat_pth "ImPart-processed fine-tuned model for chat" \
--save_pth "dir to save the merged model"
If you find this repo useful for your research, please cite us as:
@misc{yang2025impart,
title={ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs},
author={Yan Yang and Yixia Li and Hongru Wang and Xuetao Wei and James Jianqiao Yu and Yun Chen and Guanhua Chen},
year={2025},
eprint={2504.13237},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.13237},
}