Skip to content

kriskrisliu/PAT_Pruning-Aware-Tuning

Repository files navigation

Introduction

This repo is the official implementation of PAT: Pruning-Aware Tuning for Large Language Models. (arxiv)

News

  • 2024.9 - We merged pruned PAT(25%)-Llama2 which can be loaded by transformers[with-our-modification]. (download)
  • 2024.8 - We release the paper and code for PAT. (arxiv)

Acknowledgment

Modified from FireFly

Installation

conda create -n pat python=3.10
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118
cd transformers-4.40.1
pip install -e .
cd ../peft
pip install -e .
cd ..
pip install -r requirements.txt

Datasets

We employ Lamini-Instruction for fine-tuning, which can be found here in HuggingFace. Additionally, we provide our 50% randomly sampled data in this link.

Evaluation

ADAPTER=<path-to-adaptor>
FT_MODE=dimdown
GPU=0
CUDA_VISIBLE_DEVICES=$GPU python chat.py \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--adapter_name_or_path $ADAPTER \
--template_name llama2-base-alpaca  \
--ft_mode $FT_MODE \
--trainable_mask \
--identity_loss \
--chat debug-all

Model Zoo

We can merge the HSMs after PAT by using script/merge_dimdown.py.

ADAPTER=<path-to-adaptor>
python script/merge_dimdown.py \
--model_dir meta-llama/Llama-2-7b-hf \
--adaptor_path $ADAPTER

Additionally, we provide some PAT results here.

  • Llama 2 7B
  • Llama 2 13B
  • Gemma 2B
  • Gemma 7B
  • Yi-1.5 34B

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published