-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NLP] Access scaler only in FP16 case #7916
Conversation
if self.torch_dtype == torch.float16 and not self.trainer.precision.startswith("16"): | ||
# Make sure that trainer.precision_plugin.scaler exists for config_mapping below | ||
raise ValueError( | ||
"Creating a half-precision model requires setting trainer.precision to '16' or '16-mixed'" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do the same check for bf16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is needed only for model.precision=16
which is self.torch_dtype
here, see this. More specifically, the check is to make sure this will work -- loading FP16 model assumes that a scaler is available in trainer
so it needs to be configured properly.
In my understanding in other cases it is the trainer precision to handle training precision if model.precision
is different, and this would result in necessary type casting. But I might have not considered all the possibilities.
Which case exactly you mean for model.precision and trainer.precision, respectively?
e520963
to
6d1a5c0
Compare
@@ -103,7 +103,7 @@ def __init__(self, cfg: DictConfig, trainer: Trainer, no_lm_init=True): | |||
self.tokenizer = None | |||
|
|||
with open_dict(cfg): | |||
if cfg.get('precision', None) is None and trainer is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter condition is guaranteed by the check in L93
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
…ecision != 16 Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
eaf20c1
to
b08c123
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
* Remove unused 'precision' variable Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Raise informative error when trying to load FP16 model for trainer.precision != 16 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure scaler is available instead of raising error Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * trainer != None is assured thanks to a previous check Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Remove unused 'precision' variable Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Raise informative error when trying to load FP16 model for trainer.precision != 16 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure scaler is available instead of raising error Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * trainer != None is assured thanks to a previous check Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <ameister@nvidia.com>
* Remove unused 'precision' variable Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Raise informative error when trying to load FP16 model for trainer.precision != 16 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure scaler is available instead of raising error Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * trainer != None is assured thanks to a previous check Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
What does this PR do ?
When trying to load FP16 model with
trainer.precision != 16
(or16-mixed
) a user gets a mysterious error message like:I'm changing the code to access
trainer.precision_plugin.scaler.scale
only whentrainer
is configured with the relevant precision: "16" or "16-mixed".Collection: NLP
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.