-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T5-11b model parallelism #7047
Comments
Here is it
As I don't have several GPU at the moment, I tried to run it on CPU (see line 145 in error stack) |
patrickvonplaten, The following should be interesting. I have engaged them and they are planning to release the open source several months back but faces some issues with Microsoft internals. Heard the author is planning to release open source themselves. Can anyone work with them? Cheers, |
That does look interesting. Thanks for sharing! I'm not sure if we are planning on working with the author - but feel free to reach out to him and maybe this can help resolve the T5 model parallelism. |
Hello, guys. The point is: transformer blocks (T5Block) is most large parts of network. First step is to evenly spread them aross all GPUs. In the second step we spread across GPUs all other blocks of our transformer, that are incomparably smaller than main blocks. Also there are some modification of original model code to make tensors move to nesessary GPU when incoming tensor and a layer are on the different devices.
It seems in the beginning of our graph we have a large block which have a size comparable to T5Block size. The smarter way would be to split layers according to these memory usage, but I don't know a simple way to know how much memory every module use. What do you think about this? |
I tested this script on a machine with 8x32GB GPUs and have seen the same symptoms - first gpu's memoru gets fully loaded while other GPUs consume around 5 gigabytes:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi @exelents, I also need model parallelism for T5 and your code should be very helpful. However, the link to your code seems invalid. Could you please share the code with me? Best, |
Hello, @LostBenjamin. Also, you can try DeepSpeed: |
Hi @exelents, Thanks for your help! I will try the MP in transformers library. |
🚀 Feature request
I would like to finetune t5-11b model on my dataset, but found that it doesn't fit in TPU or GPU memory - colab notebook just crash when I run it.
I tried to find a ready model parallelism solution. First I found this PR:
#3578
but it seems it haven't released. I tried to merge it to master branch locally, and use it, but it's crashed.
Also I have found Eisen library that propose "model parallelism with one code line", but works only for models with only one input ( t5 have 2 inputs - tokens and mask).
I need to distribute model on several GPU, and I see somebody tried to perform it. If this development (pull request 3578) is still in process, can you tell is there are any plans to release it?
The text was updated successfully, but these errors were encountered: