-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XLNet Bug when training with apex 16-bit precision #6567
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6567 +/- ##
==========================================
- Coverage 79.18% 78.41% -0.78%
==========================================
Files 156 156
Lines 28129 28129
==========================================
- Hits 22275 22056 -219
- Misses 5854 6073 +219
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution!
Could you post a screenshot of the thrown error before this change?
Of course, @JetRunner, here it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, @JetRunner, here it is
Great! Would you mind adding a one-line comment explaining the cast?
@JetRunner, done |
Merging since the CI error looks unrelated. |
* xlnet fp16 bug fix * comment cast added * Update modeling_xlnet.py Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* xlnet fp16 bug fix * comment cast added * Update modeling_xlnet.py Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
XLNet training fail, while using 16-bit precision, because of tensor creation with explicit usage dtype=torch.float mode in relative_positional_encodings function.