-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4.27.1 breaks fp16 training of Flaubert #22426
Comments
Thanks for reporting and providing a clear reproducer! It let me pinpoint the regression to this PR. I think we shouldn't have touched that modeling code. Let me just consult internally and I will report back here with the next steps soon! |
The PR mentions above reverts the commit that introduced the bug. This will be released in a patch (4.27.4) later today. |
Awesome, thank you! |
Tested on 4.27.4, this issue is fixed, thank you again! |
Thanks for letting us know! |
System Info
transformers
version: 4.27.1Who can help?
@sgugger @ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Using transformers 4.26.1 the following script behaves properly (train and validation loss decreasing), using transformers >=4.27.1, the training loss is always 0, the validation loss always is nan.
Please note that the problem doesn't occur when removing
fp16=True
.Expected behavior
Upgrading to >=4.27.1 should produce a similar training to 4.26.1.
Thank you for your help!
The text was updated successfully, but these errors were encountered: