-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make all Transformer models compatible with model parallelism #22561
Comments
I think I can help with this Issue :) |
I would like to work on this issue - BART model :) |
Hi, I can take this up 🙌🏻 |
Indeed, this fix is required for BLOOM. main...zsc:transformers:main (my fix is hacky and not PR-ready. Just FYI) |
Just to make sure does |
Hi, I'd like to pick up the GPT-2 model! |
Hi! I am taking this up for |
It does (#22329). I have started seeing similar errors to #22546, but only after updating my drivers from 525 to 530, similar to #22546 (comment) (which is good news to me, I had no idea why that gpu started disappearing occasionally. It seems it can happen when that gpu is under any load, not just during training) Edit: seems like the errors I was getting were actually caused by GPU sag. I haven't yet reproduced that exact error, but it has been reported elsewhere. It is certainly not consistent though. |
@younesbelkada @sgugger |
I think it is supposed to work for all models listed above, as long as you are loading your model with |
I tried with such fix here #22591 (comment) but sadly it didn't work out. Any catch? |
I would love to work with BridgeTower |
Hi. I would like to try with "Whisper" |
I'd like to claim OPT model if no one else has picked it up. |
* add GPTNeoXForSequenceClassification * move the labels to logits.device (ref: #22561) * fix
Taking this up for the remaining GPT models |
Hello, I just completed the GPT-J code. Just filling in the PR now. |
Hello! I'd like to work in Whisper model |
Hi, is there any model on which I can work, please? Thanks. |
Is there any remaining model on which I can work ? Thanks . |
@sgugger Hello, can I work on the JukeBox? |
Hello @sgugger , I'd like to work on |
@sgugger I would love to work on CodeGen if it is unclaimed |
Hi @sgugger I can work on |
@sgugger I would like to work on SwitchTransformer, if not taken. |
@sgugger I think all transformers are covered, I have checked for others also...for example, switch transformers have parallelism implemented already. i think we can close this issue. The only pending models are clip,jukebox,owlvit, and Nllb , may be model parallelism is not applicable for some of there models |
Indeed, all models have been covered. Thanks a lot everyone! |
* add GPTNeoXForSequenceClassification * move the labels to logits.device (ref: huggingface#22561) * fix
Accelerate makes it easy to load a model on multiple GPUs with
device_map="auto"
. This in turn allows users to train model with naive model parallelism if they have several GPUs.A problem that happens in Transformers, with model with heads (so not XxxModel but for instance XxxModelForSequenceClassification) is that the labels end up on a different device than the logits and there is a device mistmatch error.
Thankfully, there is an easy fix for that! #22535 shows how to fix this for T5 by just moving the labels to the same device as the logits they are compared to. This is a noop when the devices are the same, and fixes the issue if devices are different.
We would like help from the community to extend this to all models that support model parallelism, which are:
LlamaForSequenceClassification
only)If you would like to grab one of those models and apply the same fix as #22535 to all the model with heads, please leave a comment here!
The text was updated successfully, but these errors were encountered: