BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference

Changwoo Lee, Soo Min Kwon, Qing Qu, and Hun-Seok Kim

University of Michigan

[Paper]

Notice

This repo is being actively updated.

Blast-Llama-4B is now available on Hugging Face! 🤗
arXiv version is available!
The paper is accepted to NeurIPS 2024.

Dependencies

The packages can be installed via conda env create --file environment.yml.

Additionally, install lm-evaluation-harness with BLAST implementation via

cd lm-evaluation-harness
pip install -e .

Blast-Llama-4B Model

Blast-Llama-4B is a Llama-7B model compressed by 50% via the procedure described below. The model can be loaded using transformers library.

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("huggyllama/llama-7b")
model = AutoModelForCausalLM.from_pretrained("cwoolee/blast-llama-4B", trust_remote_code=True)

Llama Decompsotion

Run bash ./scripts/decompose_llama.sh 0-31.

Blast-Llama Retraining

Run bash ./scripts/train_blast.sh. The script assumes that 4 gpus are available.

We re-trained the compressed Llama model for 400 steps on a subset of SlimPajama dataset available at here.

Evaluation using `lm-evaluation-harness`

Run bash scripts/lm-eval-blast.sh.

Acklowledgment

This repo is highly inspired by huggingface/transformers and EleutherAI/lm-evaluation-harness .

Citation

Please cite our paper if you find this repo or our paper useful

@inproceedings{
    lee2024blast,
    title={{BLAST}: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference},
    author={Lee, Changwoo and Kwon, Soo Min and Qu, Qing and Kim, Hun-Seok},
    booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
    year={2024},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference

Notice

Dependencies

Blast-Llama-4B Model

Llama Decompsotion

Blast-Llama Retraining

Evaluation using `lm-evaluation-harness`

Acklowledgment

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference

Notice

Dependencies

Blast-Llama-4B Model

Llama Decompsotion

Blast-Llama Retraining

Evaluation using lm-evaluation-harness

Acklowledgment

Citation

Evaluation using `lm-evaluation-harness`