-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not able to use push_to_hub
during tpu training
#2851
Comments
ohh, sometimes it works (despite the error message) |
@Wauplin cc. |
Hi @yiyixuxu, I took a look at the code (both the script and About the error message, here's why it's happening:
=> So actually, this is not an error and the script works exactly as expected. If you wait long enough, the push_to_hub command will eventually be completed and your script will gracefully exit. => I think the only problem is that we log the "waiting for..." message as an ERROR which is misleading. Since it has been implemented 18 months ago (huggingface/huggingface_hub#315) and that it's still quite used, I'm a bit reluctant in changing it without a second opinion. @LysandreJik @sgugger is that still used a lot in |
Another short term solution for |
I also opened a related issue (#2860) to update the training scripts. It's not about solving an issue but more about improving the UX. |
We don't rely on the level of the log in Transformers, so it's completely fine for me if it's downgraded to warning. |
Ok, thanks for the quick feedback @sgugger. I think I'll update the log level then. I created an issue for it: huggingface/huggingface_hub#1412. |
thanks @Wauplin for the clarification! and yeah downgrade to the warning will be really helpful:) |
Describe the bug
Not able to use
---push_to_hub
option for TPU traininggetting error
This is not a unique
train_text_to_image_flax.py
script. I'm just using it as an example. Basically, this line will always fail when called during training on a tpu https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_flax.py#L584Reproduction
run the train_text_to_image_flax script here with this command
https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-flaxjax
Logs
System Info
tpu-v4-8
The text was updated successfully, but these errors were encountered: