Skip to content

[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

Notifications You must be signed in to change notification settings

HuangOwen/RoLoRA

Repository files navigation

🔄 RoLoRA

This repository contains the code of RoLoRA introduced in our work: "RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization", published in EMNLP 2024.

🌟 Abstract

In this work, we propose RoLoRA, the first LoRA-based scheme to apply rotation for outlier elimination, and then fine-tune rotated outlier-free LLMs for effective weight-activation quantization. RoLoRA can improve low-bit LoRA convergence and post-training quantization robustness in weight-activation quantization settings. RoLoRA is evaluated across various LLM series, tasks, and quantization settings, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantization of LLaMA2-13B on commonsense reasoning tasks compared to LoRA baseline.

🌿 Citation

If you find our code useful for your research, please consider citing:

@article{huang2024rolora,
  title={RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization},
  author={Huang, Xijie and Liu, Zechun and Liu, Shih-Yang and Cheng, Kwang-Ting},
  journal={arXiv preprint arXiv:2407.08044},
  year={2024}
}

🛠️ Getting Started

Huggingface Hub Login

pip install --upgrade huggingface_hub
huggingface-cli login

Installation

pip install -r requirements.txt

If you encounter any problems installing fast_hadamard_transform using pip, please consider building from source

🚂 Finetuning

For experiments applying RoLoRA on LLaMA2-7B, please run

sh rolora.sh

Remove --rotate_down_proj and --rotate_mode 'hadamard' for LoRA baseline without rotation.

⌛ Merging

To merge RoLoRA adapter to LLaMA2-7B, please run

sh merge_rolora.sh

Specify --adapter_name_or_path and --export_dir to be path of adapter files and export target folder. Remove --rotate_down_proj and --rotate_mode 'hadamard' for merging LoRA adapter without rotation.

🔍 Evaluation

For evaluation on Zero-shot CommonSense Reasoning (ZCSR) and MMLU benchmarks, please run

sh eval_rolora.sh

Specify $NAME, $WBITS, and $ABITS for the target quantization settings. Use --w_rtn for RTN quantization on weights (default is GPTQ). If you want evaluate the quantized models on more tasks, modify --task to any tasks that are included in lm-evaluation-harness.

💾 Checkpoint

We provide the checkpoints for the RoLoRA-finetuned LLMs in the given huggingface repo. The evaluation logs are also included.

📚 Results

Below is the results in LLaMA2-7B, LLaMA2-13B, and LLaMA3-8B on zero-shot commonsense reasoning(ZCSR)and MMLU benchmarks.

#Bits Quantizer Method LLaMA-2 7B LLaMA-2 7B LLaMA-2 13B LLaMA-2 13B LLaMA-3 8B LLaMA-3 8B
ZCSR Avg. MMLU Avg. ZCSR Avg. MMLU Avg. ZCSR Avg. MMLU Avg.
FP16 - LoRA 68.4 43.5 70.5 52.4 70.0 62.7
W4A4 RTN LoRA 35.8 23.5 34.4 24.2 36.7 23.3
W4A4 RTN RoLoRA 54.1 (↑18.3) 25.8 (↑2.3) 58.7 (↑24.3) 30.5 (↑6.3) 50.0 (↑13.3) 32.1 (↑8.8)
W4A4 GPTQ LoRA 37.0 23.5 34.4 24.4 36.6 23.9
W4A4 GPTQ RoLoRA 62.3 (↑25.3) 31.0 (↑7.5) 63.9 (↑29.5) 38.9 (↑14.5) 56.6 (↑20.0) 38.5 (↑14.6)
W6A6 RTN LoRA 65.3 35.9 67.3 47.3 67.7 55.3
W6A6 RTN RoLoRA 66.8 (↑1.5) 40.5 (↑4.6) 68.4 (↑1.1) 47.7 (↑0.4) 67.8 (↑0.1) 59.4 (↑4.1)
W6A6 GPTQ LoRA 65.5 35.7 68.0 47.6 67.8 54.3
W6A6 GPTQ RoLoRA 67.1 (↑1.6) 40.8 (↑5.1) 68.8 (↑0.8) 47.9 (↑0.3) 68.1 (↑0.3) 59.4 (↑5.1)

💌 Acknowledgement

This repo benefits from SpinQuant, QuaRot, LLaMa-Factory, and fast-hadamard-transform. Thanks for their wonderful works!

If you have any questions, feel free to contact Xijie Huang (xhuangbs AT connect.ust.hk, huangxijie1108 AT gmail.com)

About

[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published