-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-gpu training still has issues #242
Comments
Looks like Multi-GPU training with naive pipeline using accelerate's device map fails for encoder-decoder models (#205 had T5 and this issue observes it for Whisper). @younesbelkada any ideas on what might be happening? |
For Whisper multi-gpu naive pp using accelerate and peft and trainer, following changes are required:
output logs:
|
hi @pacman100 @macabdul9 |
Hello @younes, I did mention it in the above comment, Thank you for mentioning it on the other issue too:
|
Awesome thanks a lot! |
It's still not working. @pacman100 can you share your working notebook if possible? |
|
Same here, any solutions? |
Not sure why, but I found that using DDP instead of DP works without this problem, i.e., |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
I have made a PR #855 with some of the changes mentioned in this thread, but I still have issues with lora_config rank_pattern becoming None |
here what i did to enable distributed data parallelism: from accelerate import Accelerator
…
accelerator = Accelerator(…)
…
model = WhisperForConditionalGeneration.from_pretrained(…, device_map={"": accelerator.device})
…
training_args = Seq2SeqTrainingArgument(…, accelerator_config={"split_batches": True})
…
trainer.train()
accelerator.wait_for_everyone()
if accelerator.is_main_process:
trainer.save_model() then launch with |
With int8
Error: RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1
Without int8
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper__native_layer_norm)
Even after #145
Details: I am training whisper-large-v2 and using pretty everything from example notebook here: https://github.com/huggingface/peft/blob/main/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb
Similar issue: #205
cc: @pacman100
The text was updated successfully, but these errors were encountered: