-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to precision=bf16
on CPU when precision=16
is passed
#10033
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
will you add a changelog entry? because this would be a backward incompatible change since the default type changes
Codecov Report
@@ Coverage Diff @@
## master #10033 +/- ##
=======================================
Coverage 93% 93%
=======================================
Files 180 180
Lines 15870 15890 +20
=======================================
+ Hits 14689 14720 +31
+ Misses 1181 1170 -11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
if self.precision in (16, "bf16"): | ||
|
||
# maybe convert the precision value | ||
if self.precision == 16 and self.use_cpu: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we support Trainer(precision="16")
, do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so because we don't convert the value to a PrecisionType
(which works for int and str)
IMO we should, but should be done in a follow-up. The previous code here had the same check
What does this PR do?
With this change,
Trainer(precision=16, accelerator='cpu)
now runs withprecision='bf16'
automatically.This is desirable for no-code-change transitions between environments with different accelerators, for example, local (CPU only) to Colab (GPU or TPU)
Trainer(amp_backend="apex", precision=16, accelerator='cpu')
will still raise an error as apex does not support bf16Part of #10027
Does your PR introduce any breaking changes? If yes, please list them.
None
Before submitting
PR review