Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLNet Bug when training with apex 16-bit precision #6567

Merged
merged 3 commits into from
Aug 20, 2020

Conversation

johndolgov
Copy link
Contributor

@johndolgov johndolgov commented Aug 18, 2020

XLNet training fail, while using 16-bit precision, because of tensor creation with explicit usage dtype=torch.float mode in relative_positional_encodings function.

@codecov
Copy link

codecov bot commented Aug 18, 2020

Codecov Report

Merging #6567 into master will decrease coverage by 0.77%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6567      +/-   ##
==========================================
- Coverage   79.18%   78.41%   -0.78%     
==========================================
  Files         156      156              
  Lines       28129    28129              
==========================================
- Hits        22275    22056     -219     
- Misses       5854     6073     +219     
Impacted Files Coverage Δ
src/transformers/modeling_xlnet.py 83.30% <100.00%> (ø)
src/transformers/optimization.py 25.55% <0.00%> (-70.00%) ⬇️
src/transformers/pipelines.py 25.63% <0.00%> (-54.32%) ⬇️
src/transformers/modeling_tf_gpt2.py 65.68% <0.00%> (-29.33%) ⬇️
src/transformers/optimization_tf.py 33.33% <0.00%> (-24.33%) ⬇️
src/transformers/modeling_tf_auto.py 48.79% <0.00%> (-18.08%) ⬇️
src/transformers/data/processors/squad.py 13.76% <0.00%> (-14.38%) ⬇️
src/transformers/modeling_auto.py 64.36% <0.00%> (-14.37%) ⬇️
src/transformers/modelcard.py 82.71% <0.00%> (-2.47%) ⬇️
src/transformers/modeling_distilbert.py 96.19% <0.00%> (-1.64%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12d7624...1c18ddf. Read the comment docs.

Copy link
Contributor

@JetRunner JetRunner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

Could you post a screenshot of the thrown error before this change?

@johndolgov
Copy link
Contributor Author

Of course, @JetRunner, here it is
Screenshot from 2020-08-20 11-47-50

Copy link
Contributor

@JetRunner JetRunner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, @JetRunner, here it is

Great! Would you mind adding a one-line comment explaining the cast?

@johndolgov
Copy link
Contributor Author

@JetRunner, done

@JetRunner
Copy link
Contributor

Merging since the CI error looks unrelated.

@JetRunner JetRunner merged commit 9539583 into huggingface:master Aug 20, 2020
Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020
* xlnet fp16 bug fix

* comment cast added

* Update modeling_xlnet.py

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
* xlnet fp16 bug fix

* comment cast added

* Update modeling_xlnet.py

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants