Skip to content
/ ImPart Public
forked from YannnnnnY/ImPart

[ACL 2025] ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs

Notifications You must be signed in to change notification settings

X1AOX1A/ImPart

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

The official repository containing the introduction and code for our ACL 2025 paper: ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs.

| πŸ”₯ News | πŸ”– ImPart | ⚑️ Quick Start | πŸ““ Citation |

πŸ”₯ News

  • May 2025: Our paper has been accepted by ACL 2025 main conference.
  • Apr 2025: We released our paper on arxiv.

πŸ”– ImPart: Overview

ImPart: Importance-Aware Delta-Sparsification

  • Motivated by the observation that singular vectors with larger singular values encode more important task-specific information.
  • ImPart assigns variable sparsity ratios to singular vectors based on their corresponding singular values.

impart

ImPart's Performance across multiple compression ratio

impart_results_across


⚑️ Quick Start

Requirments

Install all the packages from requirments.txt

conda create -n impart python=3.10 -y
conda activate impart
git clone https://github.com/sustech-nlp/ImPart.git
cd ImPart
pip install -r requirements.txt

Models & Benchmark

Task Fine-tuned Backbone Benchmark Benchmark
Math WizardMath-13B-V1.0 LLaMA-2-13B GSM8K MATH
Code WizardCoder-13B CodeLlama-13B HumanEval MBPP
Chat LLaMA-2-13B-Chat LLaMA-2-13B IFEval AlpacaEval
Chat LLaMA-2-7B-Chat LLaMA-2-7B IFEval AlpacaEval
Chat LLaMA-3-8B-Instruct LLaMA-3-8B IFEval AlpacaEval

SVD $\Delta W$

  • Compute svd of $\Delta$ weight between base model and finetuned model.
python delta.py \
  --svd \
  --base_model "meta-llama/Llama-2-13b-hf" \
  --finetuned_model "vanillaOVO/WizardMath-13B-V1.0" \
  --dim 5120 \
  --save_path "delta_weight_save_path.pt"

ImPart

Run Sparsification

python sparsify/sparsify.py \
  --config sparsify/config_example.yaml

Sparsification Performance

impart_results

  • ImPart outperforms baselines across most tasks and backbones, achieving the highest average score.

impart_results_across

  • ImPart achieves more than $2 \times$ higher compression efficiency across 8 to 64 compression ratios.

Evaluation

  • Following the implemention in DARE and Delta-CoMe

Math Reasoning

# GSM8K
bash eval/scripts/gsm8k.sh "fine-tuned model name or path" "fp16"
# MATH
bash eval/scripts/math.sh "fine-tuned model name or path" "fp16"

Code Generation

# HumanEval
bash eval/scripts/humaneval.sh "fine-tuned model name or path" "fp16"
# MBPP
bash eval/scripts/mbpp.sh "fine-tuned model name or path" "fp16"

Instruction Following

# IFEval
bash eval/scripts/ifeval.sh "fine-tuned model name or path" "fp16"
# AlpacaEval
bash eval/scripts/alpacaeval.sh "fine-tuned model name or path" "model template"

ImPart + Quantization

  • Following Delta-CoMe, ImPart-Qt applies 8-3-2 bits mix precision quantization to $\Delta W$'s sparse singular vector, detailed in Section 7.1 and Appendix B.1.
  • Extend GPTQ to accommodate sparse weight matrix as the following algorithm (Algorithm 2 in paper). impart_qt_algorithm
  • The code is modified based on the implementation of Delta-CoMe.

Run Quantization

  • Get $\Delta W$'s sparse singular vector for quantization. Ensure the total compression ratio match the target.
python sparsify/sparsify_quant.py \
  --config sparsify/config_example.yaml
  • Quantize the $\Delta W$
python quantize/sparse_llama.py \
  "fine-tuned model name or path" \
  "c4" \
  --config "quantize/13b_config_example.yaml" \
  --saved_delta_path "saving path of sparse delta weight" \
  --save_compressed_delta_dir "path to save the quantized delta weight"
  • Reload quantized $\Delta W$ to pretrained model
python delta.py \
  --merge \
  --finetuned_model "fine-tuned model name or path" \
  --delta_path "path to save the quantized delta weight" \
  --save_path "path to save the reconstructed model"

Quantization Performance

impart_qt_results

  • ImPart-Qt achieves nearly lossless performance in the Compression Ratio (CR) of 32.

impart_qt_results_across


ImPart + Model Merging

  • The $\Delta W$ processed by ImPart can be used to improve the performance of model merging.
  • Following DARE, we apply ImPart to two classic model merging method: Task Arithmetic and TIES-Merging

Run Model Merging

  • Task Arithmetic: use "merge_method" like "ta_n", where n is the scaling term.
  • TIES-Merging: use "merge_method" like "ties_t_n", where t is the trim ratio and n is the scaling term.
python merge/merge.py \
    --merge_method "ta_0.5" \ 
    --ptm_pth "pretrained model for math" \
    --math_pth "ImPart-processed fine-tuned model for math" \
    --code_pth "ImPart-processed fine-tuned model for code" \
    --chat_pth "ImPart-processed fine-tuned model for chat" \
    --save_pth "dir to save the merged model"

Merging Performance

impart_merging


πŸ““ Citation

If you find this repo useful for your research, please cite us as:

@misc{yang2025impart,
      title={ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs}, 
      author={Yan Yang and Yixia Li and Hongru Wang and Xuetao Wei and James Jianqiao Yu and Yun Chen and Guanhua Chen},
      year={2025},
      eprint={2504.13237},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.13237}, 
}

About

[ACL 2025] ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.8%
  • Shell 1.2%