Parameter-Efficient Fine-Tuning of Whisper-Large V2 in Colab on T4 GPU using 🤗 PEFT+INT8 training #988

pacman100 · 2023-02-22T06:42:44Z

pacman100
Feb 22, 2023

Attention ASR developers and researchers! 🚀 Great news, with the latest update of 🤗 PEFT, you can now fine-tune your Whisper-large model faster than ever before! The new update allows you to fit 5X larger batches with less than 10GB GPU VRAM, thanks to LoRA and @Tim_Dettmers's bnb packaged nicely in 🤗 PEFT. And the best part? You get a comparable WER, but just faster!! ⚡️

But that's not all, you no longer have to compromise on the training speed to maintain WER. In fact, in our experiments with the Marathi language, the WER was comparable with full fine-tuning runs of Whisper-large.

With 🤗 PEFT, you can now train a Whisper-large v2 model in less than 8GB GPU VRAM! 📉 . Without 🤗 PEFT, you would experience OOM on a Colab T4, but not anymore! You can easily save on storage and port tiny checkpoints, ~63 MB compared to 6.7 GB fully fine-tuned model. 🐜

And that's not all! For low latency, you can convert the PEFT model to ONNX and use ORT using 🤗 Optimum. Start experimenting today and fine-tune your Whisper using PEFT+INT8 in Colab on a language of your choice! Join our Discord community to get involved in the conversation and discuss your results and questions. 🔬

Check out the Colab notebook examples and start your ASR development journey with 🤗 PEFT today!

Links:

🤗 PEFT repo: https://github.com/huggingface/peft
Whisper training example notebook using 🤗 PEFT: https://github.com/huggingface/peft/blob/main/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb
Colab notebook example: https://colab.research.google.com/drive/1DOkD_5OUjFa0r5Ik3SgywJLJtEo2qLxO?usp=sharing
Colab notebook example with streaming dataset: https://colab.research.google.com/drive/1vhF8yueFqha3Y3CpTHN6q9EVcII9EYzs?usp=sharing
Join our Discord community: https://hf.co/join/discord

huynhthanh98 · 2023-03-04T17:33:55Z

huynhthanh98
Mar 4, 2023

6 replies

huynhthanh98 Mar 4, 2023

Hi @pacman100,

it is OOM of vram, i run on 3090 with 24 GBs vram but it still faced this error.

[/usr/local/lib/python3.8/dist-packages/transformers/trainer_pt_utils.py](https://localhost:8080/#) in torch_pad_and_concatenate(tensor1, tensor2, padding_index)
     78 
     79     # Now let's fill the result tensor
---> 80     result = tensor1.new_full(new_shape, padding_index)
     81     result[: tensor1.shape[0], : tensor1.shape[1]] = tensor1
     82     result[tensor1.shape[0] :, : tensor2.shape[1]] = tensor2

OutOfMemoryError: CUDA out of memory. Tried to allocate 4.13 GiB (GPU 0; 14.75 GiB total capacity; 6.50 GiB already allocated; 898.81 MiB free; 12.59 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ILG2021 May 9, 2023

large-v2 full need 40.2G ram

emcodem Jun 12, 2023

@ILG2021 where you know from (maybe you can send a link?) and is 40.2G ram the value with or without the advertised PEFT?

ILG2021 Jun 13, 2023

I have fine tune a model on A6000. But next time I will try deepspeed to train on a 24g gpu

drAliMollaei Dec 1, 2023

hi @huynhthanh98
Did you finally manage to solve the OOM problem with the RTX 3090 GPU because I have this problem too

Keith-Hon · 2023-04-01T15:47:32Z

Keith-Hon
Apr 1, 2023

How do we get the timestamp using peft?

1 reply

Ar770 May 10, 2023

Found the right way doing it?
@pacman100 Can you please answer this?

ben-8878 · 2023-04-05T07:33:58Z

ben-8878
Apr 5, 2023

when finetuing， I get follow train error：
RuntimeError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 31.75 GiB total capacity; 29.27 GiB already allocated; 60.75 MiB free; 30.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

2 replies

pacman100 Apr 5, 2023
Author

Hello, is it during inference/evaluation loop or during training loop?

pacman100 Apr 5, 2023
Author

Could you provide a minimal reproducible example code that we can execute? This is strange as the colab notebooks linked above work without any issues

ben-8878 · 2023-04-05T08:31:53Z

ben-8878
Apr 5, 2023

@pacman100 I flow https://github.com/huggingface/peft/blob/main/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb to do asr fine-tuning training on about 1w hours data, when run trainer.train(), I met follows errors:
return module(*inputs, output_attentions) File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformeRuntimeError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 31.75 GiB total capacity; 29.27 GiB already allocated; 60.75 MiB free; 30.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

the recipe of finetuing maybe just for small train data, for large train data, it maybe has some problem, https://github.com/huggingface/peft/blob/main/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb

my run codes:

trainer = Seq2SeqTrainer(
    args=training_args,
    model=model,
    train_dataset=common_voice["train"],
    eval_dataset=common_voice["test"],
    data_collator=data_collator,
    # compute_metrics=compute_metrics,
    tokenizer=processor.feature_extractor,
    callbacks=[SavePeftModelCallback],
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

2 replies

pacman100 Apr 5, 2023
Author

I don't think that would be the case because each sample is padded or truncated to 30seconds and if the batch size remains the same, then how does it change the amount of computation or intermediate activations?

You can try the Colab notebook with the streaming dataset example for large datasets.

pacman100 Apr 5, 2023
Author

This colab: https://colab.research.google.com/drive/1vhF8yueFqha3Y3CpTHN6q9EVcII9EYzs?usp=sharing

Let us know if this works? Ideally, the other one should work without any issues too

ben-8878 · 2023-04-06T01:22:40Z

ben-8878
Apr 6, 2023

@pacman100 A different is that I use set_transfomer() to do feature_extractor on the fly, my dataset is very big, when I use map() to feature_extractor, my 2TB disk is full.
Another different is that I load dataset from local disk.

0 replies

ben-8878 · 2023-04-06T05:16:36Z

ben-8878
Apr 6, 2023

7 replies

ben-8878 Apr 6, 2023

peft== 0.2.0
accelerate== 0.18.0
transformers==4.27.2

ben-8878 Apr 6, 2023

with “model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2", load_in_8bit=True, device_map="auto")”， The above error disappears but get new errors：

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/ybZhang/miniconda3/envs/whister did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 114
/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
CUDA SETUP: Loading binary /home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
trainable params: 10485760 || all params: 1553790720 || trainable%: 0.6748502140622902
<datasets.iterable_dataset.IterableDataset object at 0x7f2762284850>
/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                                                                                       | 0/1500 [00:00<?, ?it/s]Traceback (most recent call last):
  File "finetune.py", line 166, in <module>
    whisper_finetune(traindir,devdir,outdir)
  File "finetune.py", line 161, in whisper_finetune
    trainer.train()
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 1633, in train
    return inner_training_loop(
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 1902, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 2645, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/trainer.py", line 2677, in compute_loss
    outputs = model(**inputs)
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 157, in forward
    raise RuntimeError("module must have its parameters and buffers "
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1
  0%|                                                                                                                       | 0/1500 [02:06<?, ?it/s]

pacman100 Apr 6, 2023
Author

Hello at the start, set CUDA_VISIBLE_DEVICES to only use 1 GPU as that is enough for tuning Whisper large.

ben-8878 Apr 6, 2023

set it，still get same error

pacman100 Apr 6, 2023
Author

please check model.hf_device_map. if there are more than 1 GPUs, then the CUDA_VISIBLE_DEVICES isn't set properly. If you want to use Multi-GPU INT8 for training, please check huggingface/peft#242 (comment)

ben-8878 · 2023-04-06T06:29:59Z

ben-8878
Apr 6, 2023

it seems not work, follow huggingface/peft#242 (comment)

preprocessing_only=False
do_lower_case = False
do_remove_punctuation = False
max_input_length = 30.0
feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-large-v2")
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-large-v2", language="zh", task="transcribe")
processor = WhisperProcessor.from_pretrained("openai/whisper-large-v2", language="zh", task="transcribe")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2", load_in_8bit=True, device_map="auto")
device_map = model.hf_device_map.copy()
device_map["model.decoder.embed_tokens"] = model._hf_hook.execution_device
device_map["model.decoder.embed_positions"] = model._hf_hook.execution_device
device_map["proj_out"] = model._hf_hook.execution_device
dispatch_model(model, device_map=device_map)
model.hf_device_map

model.config.suppress_tokens = []
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")
model = prepare_model_for_int8_training(model, output_embedding_layer_name="proj_out")
metric = evaluate.load("cer")
config = LoraConfig(r=32,
                    lora_alpha=64,
                    target_modules=".*decoder.*(self_attn|encoder_attn).*(q_proj|v_proj)$",#["q_proj", "v_proj"],
                    lora_dropout=0.05,
                    bias="none")
model = get_peft_model(model, config)
model.print_trainable_parameters()

2 replies

pacman100 Apr 6, 2023
Author

Please raise the issue in the PEFT repo with minimal code that we can run to deep dive.

ben-8878 Apr 6, 2023

submit it

Jiltseb · 2023-04-12T07:26:26Z

Jiltseb
Apr 12, 2023

@pacman100 But can finetuning whisper retain its generalization ability or will it try to overfit to your training data? I have been observing that if I finetune huggingface multilingual whisper on lang1 and test WER in another language, its performance degrades in comparison to the original multilingual model. Aren't you only utilizing the language-specific tokens during finetuning and masking others?
#694 also mentions this. How can you finetune and keep the generalization ability of the model across languages?

6 replies

pacman100 Apr 12, 2023
Author

So, for any language other than the one fine-tuned on, you can disable lora and use it only for the samples from the fine-tuned language

ILG2021 · 2023-05-08T10:53:45Z

ILG2021
May 8, 2023

How to load from a checkpoint?

0 replies

ILG2021 · 2023-05-09T11:27:24Z

ILG2021
May 9, 2023

Can not reference on windows. Because bitsandbytes not support windows.

0 replies

ben-8878 · 2023-06-16T01:32:23Z

ben-8878
Jun 16, 2023

For multi-GPU, finetune is very slow for 10,000 hours of data, only training 0.5 epochs a month

1 reply

thisiskeithkwan Aug 23, 2023

this is peft-lora finetune or full finetune?

Parameter-Efficient Fine-Tuning of Whisper-Large V2 in Colab on T4 GPU using 🤗 PEFT+INT8 training #988

Replies: 11 comments · 27 replies

pacman100 Apr 5, 2023 Author

pacman100 Apr 5, 2023 Author

pacman100 Apr 5, 2023 Author

pacman100 Apr 5, 2023 Author

pacman100 Apr 6, 2023 Author

pacman100 Apr 6, 2023 Author

pacman100 Apr 6, 2023 Author

pacman100 Apr 12, 2023 Author

Replies: 11 comments 27 replies

pacman100 Apr 5, 2023
Author

pacman100 Apr 5, 2023
Author

pacman100 Apr 5, 2023
Author

pacman100 Apr 5, 2023
Author

pacman100 Apr 6, 2023
Author

pacman100 Apr 6, 2023
Author

pacman100 Apr 6, 2023
Author

pacman100 Apr 12, 2023
Author