[ACL 2025] ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs

The official repository containing the introduction and code for our ACL 2025 paper: ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs.

🔥 News

May 2025: Our paper has been accepted by ACL 2025 main conference.
Apr 2025: We released our paper on arxiv.

🔖 ImPart: Overview

ImPart: Importance-Aware Delta-Sparsification

Motivated by the observation that singular vectors with larger singular values encode more important task-specific information.
ImPart assigns variable sparsity ratios to singular vectors based on their corresponding singular values.

ImPart's Performance across multiple compression ratio

⚡️ Quick Start

Requirments

Install all the packages from requirments.txt

conda create -n impart python=3.10 -y
conda activate impart
git clone https://github.com/sustech-nlp/ImPart.git
cd ImPart
pip install -r requirements.txt

Models & Benchmark

Task	Fine-tuned	Backbone	Benchmark	Benchmark
Math	WizardMath-13B-V1.0	LLaMA-2-13B	GSM8K	MATH
Code	WizardCoder-13B	CodeLlama-13B	HumanEval	MBPP
Chat	LLaMA-2-13B-Chat	LLaMA-2-13B	IFEval	AlpacaEval
Chat	LLaMA-2-7B-Chat	LLaMA-2-7B	IFEval	AlpacaEval
Chat	LLaMA-3-8B-Instruct	LLaMA-3-8B	IFEval	AlpacaEval

SVD $\Delta W$

Compute svd of $\Delta$ weight between base model and finetuned model.

python delta.py \
  --svd \
  --base_model "meta-llama/Llama-2-13b-hf" \
  --finetuned_model "vanillaOVO/WizardMath-13B-V1.0" \
  --dim 5120 \
  --save_path "delta_weight_save_path.pt"

ImPart

Run Sparsification

python sparsify/sparsify.py \
  --config sparsify/config_example.yaml

Sparsification Performance

ImPart outperforms baselines across most tasks and backbones, achieving the highest average score.

ImPart achieves more than $2 \times$ higher compression efficiency across 8 to 64 compression ratios.

Evaluation

Following the implemention in DARE and Delta-CoMe

Math Reasoning

# GSM8K
bash eval/scripts/gsm8k.sh "fine-tuned model name or path" "fp16"
# MATH
bash eval/scripts/math.sh "fine-tuned model name or path" "fp16"

Code Generation

# HumanEval
bash eval/scripts/humaneval.sh "fine-tuned model name or path" "fp16"
# MBPP
bash eval/scripts/mbpp.sh "fine-tuned model name or path" "fp16"

Instruction Following

# IFEval
bash eval/scripts/ifeval.sh "fine-tuned model name or path" "fp16"
# AlpacaEval
bash eval/scripts/alpacaeval.sh "fine-tuned model name or path" "model template"

ImPart + Quantization

Following Delta-CoMe, ImPart-Qt applies 8-3-2 bits mix precision quantization to $\Delta W$'s sparse singular vector, detailed in Section 7.1 and Appendix B.1.
Extend GPTQ to accommodate sparse weight matrix as the following algorithm (Algorithm 2 in paper).
The code is modified based on the implementation of Delta-CoMe.

Run Quantization

Get $\Delta W$'s sparse singular vector for quantization. Ensure the total compression ratio match the target.

python sparsify/sparsify_quant.py \
  --config sparsify/config_example.yaml

Quantize the $\Delta W$

python quantize/sparse_llama.py \
  "fine-tuned model name or path" \
  "c4" \
  --config "quantize/13b_config_example.yaml" \
  --saved_delta_path "saving path of sparse delta weight" \
  --save_compressed_delta_dir "path to save the quantized delta weight"

Reload quantized $\Delta W$ to pretrained model

python delta.py \
  --merge \
  --finetuned_model "fine-tuned model name or path" \
  --delta_path "path to save the quantized delta weight" \
  --save_path "path to save the reconstructed model"

Quantization Performance

ImPart-Qt achieves nearly lossless performance in the Compression Ratio (CR) of 32.

ImPart + Model Merging

The $\Delta W$ processed by ImPart can be used to improve the performance of model merging.
Following DARE, we apply ImPart to two classic model merging method: Task Arithmetic and TIES-Merging

Run Model Merging

Task Arithmetic: use "merge_method" like "ta_n", where n is the scaling term.
TIES-Merging: use "merge_method" like "ties_t_n", where t is the trim ratio and n is the scaling term.

python merge/merge.py \
    --merge_method "ta_0.5" \ 
    --ptm_pth "pretrained model for math" \
    --math_pth "ImPart-processed fine-tuned model for math" \
    --code_pth "ImPart-processed fine-tuned model for code" \
    --chat_pth "ImPart-processed fine-tuned model for chat" \
    --save_pth "dir to save the merged model"

Merging Performance

📓 Citation

If you find this repo useful for your research, please cite us as:

@misc{yang2025impart,
      title={ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs}, 
      author={Yan Yang and Yixia Li and Hongru Wang and Xuetao Wei and James Jianqiao Yu and Yun Chen and Guanhua Chen},
      year={2025},
      eprint={2504.13237},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.13237}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets/imgs		assets/imgs
eval		eval
merge		merge
quantize		quantize
sparsify		sparsify
utils		utils
.gitignore		.gitignore
README.md		README.md
delta.py		delta.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[ACL 2025] ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs

🔥 News

🔖 ImPart: Overview

ImPart: Importance-Aware Delta-Sparsification

ImPart's Performance across multiple compression ratio

⚡️ Quick Start

Requirments

Models & Benchmark

SVD $\Delta W$

ImPart

Run Sparsification

Sparsification Performance

Evaluation

Math Reasoning

Code Generation

Instruction Following

ImPart + Quantization

Run Quantization

Quantization Performance

ImPart + Model Merging

Run Model Merging

Merging Performance

📓 Citation

About

Uh oh!

Releases

Packages

Languages

X1AOX1A/ImPart

Folders and files

Latest commit

History

Repository files navigation

[ACL 2025] ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs

🔥 News

🔖 ImPart: Overview

ImPart: Importance-Aware Delta-Sparsification

ImPart's Performance across multiple compression ratio

⚡️ Quick Start

Requirments

Models & Benchmark

SVD $\Delta W$

ImPart

Run Sparsification

Sparsification Performance

Evaluation

Math Reasoning

Code Generation

Instruction Following

ImPart + Quantization

Run Quantization

Quantization Performance

ImPart + Model Merging

Run Model Merging

Merging Performance

📓 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages