Parameters from Lightning training when lora rank is 16 and I use lora weights on query, value and feedforward matrices of transformer blocks.
| Name | Type | Params | Mode
--------------------------------------------------------
0 | sbert | BertModel | 23.2 M | eval
1 | lora_module_list | ModuleList | 516 K | train
2 | mlp | MLP | 58.1 K | train
--------------------------------------------------------
574 K Trainable params
22.7 M Non-trainable params
23.3 M Total params
The trainable parameters are about 2.46%
of total parameters. The model checkpoint is of size 3.4 Mb
(LoRA weights, MLP weights and optimiser state). The training is done in bfloat16
. The equvalent fp32
checkpoint is 6.7 Mb
large.
-
I finetuned the model on the IMDB sentiment analysis dataset - 25000 training examples and 25000 validation examples.
-
Frist, cd in the root of the repo.
-
Second, run
export PYTHONPATH=.
-
Third, run
wandb login
to login to the wandb with your api key. -
Fourth, run
export WANDB_START_METHOD="thread"
otherwise some weird threading exception occurs. For more info see this issue. -
Fifth, set tokenizers env variable to avoid warnings
export TOKENIZERS_PARALLELISM=true
-
Training command:
python src/run_experiment.py fit --config fit_config.yaml --trainer.accelerator=gpu --trainer.devices=1 --trainer.max_epochs=12 --trainer.check_val_every_n_epoch=2 --trainer.log_every_n_step=25 --data.num_workers=4 --my_model_checkpoint.every_n_epochs=2 --model.lora_alpha=2 --model.lora_rank=16 --model.lr=3e-4 --model.do_lora=true --trainer.precision=bf16-true
-
After checking various configurations for the rank
(8, 16, 32)
and alpha parameter of LoRA(0.25, 1, 2, 8)
, I found that the algorithm is not very sensitive to the rank but is very sensitive to alpha. The configuration withlora_alpha=2
andlora_rank=16
seemed to work well - achieving90.12%
accuracy. -
Each experiment was run for 12 epochs with
batch_size=250
on a L4 GPU from a Lightning Studio.- When training in
fp32
experiments with full LoRA (query, value and ffn weight adaptors) experiments took about 28 minutes with validation accuracy89.74%
. - When training in
bf16
experiments with full LoRA (query, value and ffn weight adaptors) experiments took 8 minutes and 38 seconds with validation accuracy90.12%
. - When training in
bf16
with LoRA only on query and value matrices, experiments took 6 minutes and 17 seconds with validation accuracy88.93%
. - When training in
bf16
with no LoRA (only the MLP classifier) experiments took 3m and 34 seconds with validation accuracy80.72%
.
- When training in
-
The best performing model used full LoRA training and was trained in
bf16
format. You can see the comparison of the training curves fromwandb
for no-LoRA (green), LoRA only on query and value matrices (brown) and full LoRA on query, value and ffn matrices (beige). The training curve for full LoRA trained infp32
is in cyan colour. The full LoRA achieves the least training loss in bothfp32
andbf16
precision. The GPU utilisation was about85%
for full LoRA inbf16
,95%
for full LoRA infp32
, but less on the others holding everything else fixed. -
The training curves from wandb are given below:
-
The gpu utilisation curves are shown below:
-
I managed to deploy the model in
bfloat16
. The model checkpoint needs to be saved assaved_models/latest-bf16.ckpt
. -
Build the image from the root of the dir:
docker build -t flask-pytorch-model -f api_server/Dockerfile .
-
Run the container, should start the server:
docker run -p 5000:5000 --name flask-cont-1 flask-pytorch-model
-
From another terminal, test the app: This should be positive sentiment
curl -X POST http://0.0.0.0:5000/predict -H "Content-Type: application/json" -d '{"text": "The movie was wonderful!"}'
This should be negative sentiment
curl -X POST http://0.0.0.0:5000/predict -H "Content-Type: application/json" -d '{"text": "The movie was awful!"}'
-
I got the following responses respectively: First result
{"body":"{\"sentiment\": [\"positive\"]}","headers":{"Content-Type":"application/json"},"statusCode":200}
Second result:
{"body":"{\"sentiment\": [\"negative\"]}","headers":{"Content-Type":"application/json"},"statusCode":200}
-
After
export PYTHOPATH=.
just run:gradio api_server/gradio_app.py
You should get a public link - something that ends in
gradio.live
and open that link. -
I tested the app and got the following screenshops:
and when I input an empty string:
-
You can also serve from inside a Docker container. After you have built the image as described above, just run:
docker run -p 5000:5000 --name flask-cont-1 flask-pytorch-model bash -c "python ./app/api_server/gradio_app.py"
and navigate to
localhost:5000
.