Initial commit

YupengSu · Aug 21, 2024 · 6f5ca4e · 6f5ca4e
1 parent 151f61e
commit 6f5ca4e
Show file tree

Hide file tree

Showing 5 changed files with 39 additions and 29 deletions.
diff --git a/INSTALL.md b/INSTALL.md
@@ -3,7 +3,7 @@
 **Step 1: Create a new conda environment:**
 
 ```
-conda create -n barber python=3.9
+conda create -n barber python=3.10
 
 conda activate barber
 ```

diff --git a/README.md b/README.md
@@ -1,64 +1,74 @@
 # LLM-Barber
-Code for the paper "LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models".
-
-**LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models** 
-
-Yupeng Su*, Ziyi Guan*, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu, Graziano Chesi, Ngai Wong, Hao Yu(* indicates equal contribution)
+> [LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models [arxiv]](https://arxiv.org/abs/2408.10631)
+> 
+> *Yupeng Su\*, Ziyi Guan\*, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu, Graziano Chesi, Ngai Wong, Hao Yu (\* indicates equal contribution)*
+>
+> Southern University of Science and Technology, University of Hong Kong
 
 ![Figure 1a](img/figure1a.png)
-LLM-Barber integrates pruning across both Self-Attention and MLP block, mitigates error accumulation, as evidenced by the
-lighter orange arrows, facilitating global optimization and
-improved model performance.
+Transition from the layer-aware to block-aware error accumulation to achieve an optimized global solution.
 ![Figure 1b](img/figure1b.png)
-LLM-Barber identifies weights that, although initially non-salient without a
-sparsity mask, gain significance in post-pruning. 
+Rebuilding sparsity mask using a novel pruning metric based on weights multiplied by gradients.
 
 ## Setup
 To install, follow the instructions in the [INSTALL.md](INSTALL.md) file.
 
 ## Usage
 The [scripts](./scripts/) directory houses all Bash commands necessary to reproduce the primary findings presented in our paper.
 
-The following command demonstrates pruning LLaMA-7B using LLM-Barber to achieve 50% unstructured sparsity.
+The following command demonstrates pruning LLaMA-7B with LLM-Barber to achieve 50% unstructured sparsity based on initialization method Wanda.
 
 ```bash
 python main.py \
     --model huggyllama/llama-7b \
-    --prune_method magnitude \
+    --prune_method wanda \
     --sparsity_ratio 0.5 --sparsity_type unstructured  \
+    --prune_barber --prune_granularity output1 --threshold 0.01 \
     --save_model /path/to/save/model --save_ppl /path/to/save/ppl --save_zeroshot /path/to/save/zeroshot \
     --delete
 ```
 
 Here's an overview of the arguments used in the command:
 
-* **`--model huggyllama/llama-7b`**: Specifies the LLaMA model to use from the Hugging Face model hub.
-* **`--prune_method magnitude`**: Selects the pruning method, here it's "magnitude" pruning.
-* **`--sparsity_ratio 0.5`**: Sets the sparsity ratio to 0.5, meaning 50% of the weights will be pruned.
-* **`--sparsity_type unstructured`**: Specifies the type of sparsity as "unstructured".
-* **`--save_model /path/to/save/model`**: Defines the directory where the pruned model will be saved.
-* **`--save_ppl /path/to/save/ppl`**: Defines the directory where the perplexity results will be saved.
-* **`--save_zeroshot /path/to/save/zeroshot`**: Defines the directory where the zero-shot results will be saved.
-* **`--delete`**: This flag indicates that the pruned model should be deleted after the experiment. 
+* `--model`: Specifies the LLaMA model to use from the Hugging Face model hub.
+* `--prune_method`: Selects the pruning method, option [`magnitude`, `sparsegpt`, `wanda`].
+* `--sparsity_ratio`: Sets the sparsity ratio, meaning the percentage of the weights will be pruned.
+* `--sparsity_type`: Specifies the type of sparsity [`unstructured`,`2:4`,`4:8`].
+* `--prune_barber`: This flag indicates that the model will be pruned with LLM-Barber.
+* `--prune_granularity`: Specifies the granularity of the pruning, option [`block`, `layer`, `output1`, `input1`].
+* `--threshold`: Sets the mask rebuilding ratio of LLM-Barber, default is 0.01.
+* `--save_model`: Defines the directory where the pruned model will be saved.
+* `--save_ppl`: Defines the directory where the perplexity results will be saved.
+* `--save_zeroshot`: Defines the directory where the zero-shot results will be saved.
+* `--delete`: This flag indicates that the pruned model should be deleted after the experiment. 
 
-This command will run the `main.py` script with the specified arguments, pruning the "huggyllama/llama-7b" model using magnitude pruning with a sparsity ratio of 0.5 and unstructured sparsity. The results will be saved to the specified directories, and the pruned model will be deleted after the experiment.
+This command will run the `main.py` script with the specified arguments, pruning the "huggyllama/llama-7b" model using initialization method Wanda with a sparsity ratio of 0.5 and unstructured sparsity. The results will be saved to the specified directories, and the pruned model will be deleted after the experiment.
 
 To implement structured N:M sparsity, set the --sparsity_type argument to either "2:4" or "4:8". An example command is provided below.
 ```bash
 python main.py \
     --model huggyllama/llama-7b \
-    --prune_method magnitude \
+    --prune_method wanda \
     --sparsity_ratio 0.5 --sparsity_type 2:4  \
+    --prune_barber --prune_granularity output1 --threshold 0.01 \
     --save_model /path/to/save/model --save_ppl /path/to/save/ppl  \
     --delete
 ```
 
-
-
 ## Acknowledgement
 This repository is build upon the [Wanda](https://github.com/locuslab/wanda) and [SparseGPT](https://github.com/IST-DASLab/sparsegpt) repository.
 
 ## License
 This project is released under the MIT license. Please see the [LICENSE](LICENSE) file for more information.
 
+## Cite
+If you find our work useful, please consider citing our paper:
+```bibtex
+@article{su2024llmbarber,
+    author = {Yupeng Su and Ziyi Guan and Xiaoqun Liu and Tianlai Jin and Dongkuan Wu and Graziano Chesi and Ngai Wong and Hao Yu},
+    title = {LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models},
+    year = {2024},
+    eprint = {arXiv:2408.10631},
+}
+```
 
diff --git a/lib/llama.py b/lib/llama.py
@@ -10,7 +10,7 @@
 import matplotlib.pyplot as plt
 import os
 
-DEBUG = True
+DEBUG = False
 
 def find_layers(module, layers=[nn.Linear], name=''):
     """

diff --git a/lib/opt.py b/lib/opt.py
@@ -9,7 +9,7 @@
 import matplotlib.pyplot as plt
 import os
 
-DEBUG = True
+DEBUG = False
 
 def find_layers(module, layers=[nn.Linear], name=''):
     """

diff --git a/requirements.txt b/requirements.txt
@@ -1,8 +1,8 @@
+transformers==4.44.0
 datasets==2.20.0
 lm_eval==0.4.2
 matplotlib==3.9.2
 numpy==2.0.1
 pandas==2.2.2
-torch==2.3.1+cu121
 tqdm==4.66.4
-transformers==4.44.0
+