SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers

Sparse version of weight gradients in MLP layers.

Using Tucker decomposition, we found a space where the weights of the linear layers is highly sparse (about 1% of all parameters remain significant). By rewriting Torch Autograd weight in this space, we reduced the number of trainablr parameters, and consequently, the memory usage. With the same memory consumption, our model outperforms LoRA and baselines that select the most significant parameters in the linear layer and train only those.

The modifications of TorchAutograd and Forward and Backward operation are in sparse_grads/sparse_optimizers/sparse_grad_matrix_sparse.py

The modifications of Hugginface Trainer class for model with semi-sparse gradients are in sparse_grads/sparse_optimizers/trainer_custom.py

Experiments with BERT and RoBERTa

To apply Sparse Grad method to fine-tune BERT model on Cola dataset:

cd bert/

CUDA_VISIBLE_DEVICES=0 python3 test_bert.py --cuda 0 --sparse_grad

To do the same with RoBERTa:

cd roberta/

CUDA_VISIBLE_DEVICES=0 python3 test_roberta.py --cuda 0 --sparse_grad

To employ the full fine-tuning benchmark:

switch branch from main to lora_benchmark

cd sparse_grad

run.py --task 'stsb' --run_type sparse --model_path 'roberta-base' --optimize False --n_params 280000

--task task in GLUE

--run_type ft (regular fine-tune), lora (Low Rank Adaptation), sparse (Sparse Grad), meprop (MeProp)

--model_path Path to model on HF or local storage

--optimize It now involves a search for optimal parameters.

--n_params The number of parameters in the linear layer remained trainable.

Experiments with LLaMa 2 7B

LLaMa 2 7B fine-tuned on openassistant dataset are in HF https://huggingface.co/Sayankotor/RegularLlama

LLaMa 2 7B fine-tuned on openassistant dataset using LoRA are in HF https://huggingface.co/Sayankotor/LoraLlama

The weights of LLaMa 2 7B fine-tuned on openassistant dataset using Sparse Grad can be downloaded in llama_ft/Model_Download.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
benchmarking		benchmarking
bert		bert
llama_ft		llama_ft
roberta		roberta
sparse_optimizers		sparse_optimizers
stuff		stuff
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers

Experiments with BERT and RoBERTa

Experiments with LLaMa 2 7B

About

Releases

Packages

Contributors 3

Languages

sayankotor/sparse_grads

Folders and files

Latest commit

History

Repository files navigation

SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers

Experiments with BERT and RoBERTa

Experiments with LLaMa 2 7B

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages