-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Accelerate with TPU Pod VM like v3-32 #471
Comments
Thanks for this report @huunguyen10, I'll look into this further. As to how we run tests, we use colab's v2 VM. Re; your Will look into the issue on |
Thank you @muellerzr! Here is the error I met:
I used TPU VM |
Were you able to get past this issue? @huunguyen10 |
Would love to know as well what the follow-up on this is. Also see sumanthd17's issue |
We're going to keep this issue and the linked issue below open about the TPU pods, see Sylvain and I's last note on it for more information as to what's happening currently and the state we're at with it #501 (comment) |
This has now been introduced in #1049. Please follow the new
The example script I use is located here: We have also introduced a |
Hi, thank you for great library.
I have just install accelerate on a TPU VM V3-32 but when I set number of TPU cores to 32 with
accelerate config
and runaccelerate test
, it throw an error:ValueError: The number of devices must be either 1 or 8, got 32 instead
So that mean accelerate haven't supported training on a TPU pod VM. Can you please add this feature to Accelerate?
By the way, I meet another problem, too. If I use
accelerate=0.9
with TPU VMv2-alpha
,accelerate test
run successfully. But if I useaccelerate=0.10
withv2-alpha
ortpu-vm-pt-1.11
ortpu-vm-pt-1.10
,accelerate test
can not finish runing, it just run forever.And when I run
it throw some errors (even
accelerate=0.9
with TPU VMv2-alpha
).Can you please tell me which TPU VM version do you ussually use with Accelerate?
Thank you!
The text was updated successfully, but these errors were encountered: