-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: BFloat16 Unsupported scalar when trying to execute across multiple GPUs with BFloat16 & 8-Bits #79
Comments
Hi there! Will look into that later today (AOE) and try to reproduce. On the surface, we should not have bfloat16 at that stage, so it should be easy to fix. brb. |
This PR implements bfloat16 support for `CompressionType.NONE` and `CompressionType.BLOCKWISE_8BIT`. This is important for the Petals client, see bigscience-workshop/petals#79
Hi @FTuma! Sorry for taking the long time to look into this. The issue should be fixed now (don't forget to pull the latest For the reference, here are the two PRs where we did that:
I will close this issue for now, but feel free to reopen it or make a new one if you run into other issues. |
This PR implements bfloat16 support for `CompressionType.NONE` and `CompressionType.BLOCKWISE_8BIT`. This is important for the Petals client, see bigscience-workshop/petals#79 (cherry picked from commit 1e4af43)
I tried to run BLOOM distributed across multiple A100 GPUs with 8-Bit and using BFloat16 but ran into this error while trying to execute a slightly adjusted version of the example script:
The code of simple_example_script:
Server launched via commands:
Packages in the environment, have been installed via requirements.txt:
I just used the small version for debugging purposes, I need to distribute it across multiple GPUs since I intend to run the 176bn BLOOM version. I tried to naively just convert the tensor at that line to a supported DType but then another error occured somewhere else down the line.
Since I want to do Prompt Tuning on 8x 40GB A100s, I think I have to use BFloat16 & 8Bit or is there another solution/workaround with good performance?
The text was updated successfully, but these errors were encountered: