-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is precision="mixed"
redundant?
#9956
Comments
@carmocca during the first accelerator refactor we have introduced the term |
What's the status of this discussion? I wanted to train a BERT-like model with half precision, where the model is also half precision and not only the inputs. What would be the correct way to train not using mixed precision? Thanks! |
That's not yet decided. For a true half precision training the current way to go would be to use a custom |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
A user in Slack got confused by this
Do we want to support true half-precision? I think we should. The problem is the API. Since The disadvantage of this is that the instinct from users will be to pass One more option is to add one extra flag |
Thanks, @carmocca I think this is a great issue to bring up. I too was confused by the naming convention originally and required me to go through Pytorch Lightning to understand what the flag means. I believe that changing |
My suggestion for the API would be to have arguments
In terms of naming for the native backend, I would suggest Full precision: If there wont be half precision, 16 should be deprecated because it is a misnomer. |
I agree with @SeanNaren. Even though users will need to remember to set two flags ( I also like @carmocca's suggestion for adding a flag |
with another flag, we will end up with 4 flags that determine precision for different configurations. off-topic: I was thinking if we could deprecate |
@wisecornelius I think that is very correct and concise but prone to errors from inexperienced users who don't understand all the nuances and differences between backends.
This works now, I just wonder if there'll ever be a new precision algorithm. We would want to deprecate this bool if that ever happens. So we would have What I don't really like is that the product of options is very sparse. It does feel boilerplate-y to me but there's no better solution without going through deprecations.
This makes sense to me, but you should open a new issue for that. |
@carmocca I agree. I do not like the sparsity. It seems that the strategy repository is currently a project to make the strategy selection using dense arguments. I just learned that |
I feel like having four different options controlling the precision AND, as @carmocca pointed out, the product of these to be very sparse is not a good approach. How about having "only" three of them and folding the |
Looking at the history, newly added trainer arguments usually don't survive that long 😅 I believe we should take in the learnings from the accelerator/strategy/plugin syntax, where we introduced the concept of registries for combinations of selections under one name. # before
Trainer(precision=32|16|64)
# after
Trainer(precision="full"|"mixed"|"double")
# new: true half precision everything
Trainer(precision="half")
# same
Trainer(precision="bf16")
# before
Trainer(precision=16, amp_backend="apex", amp_level="02")
# after
Trainer(precision="apex02")
# for specific customization
Trainer(plugins=ThisSpecialPrecisionPlugin(backend=..., level="X", ...) These are my random thoughts. |
@awaelchli I like that approach! However, this implies that when having |
Yes, I like it. In my proposal the main motivation was to use a different name than for the the (int) 16 to be able to introduce true half precision. So yours works too. |
Trying to revive this discussion, here's what I find most logical
The old precision values show a warning explaining the historical change |
I like it... just not sure Should we do dict? {'precision: 16, 'mixed': False, amp_type: 'apex'} key names can be improved. also, |
I prefer the simplicity of the string options, they are consistent with our other registry based trainer arguments. Also, passing a dict is very similar to passing separate arguments, and has the disadvantages of #9956 (comment) |
Well, sort of. They are indicated by the argument name in their respective strategy class ( |
@carmocca do we really want to append "-true" to all non-mixed precision stuff? intuitively, when I see precision="16", I would assume that this is always the case and there would be no need to append |
Okay. Updated the table above. However this is less explicit which can be bad for those who don't notice there's a difference between "true" and "mixed" |
Two notes I want to bring up:
|
I agree @awaelchli. I believe I wrote the above proposal before we decided to deprecate Apex |
The comment about "native" in the name is also something I'd like to push for removing in the class names in the future. We already have a plugin now that has |
The removal of apex has simplified this proposal greatly. I edited my proposed API in #9956 (comment) |
Proposed refactoring or deprecation
Does
precision="mixed"
act differently toprecision=16
in any way?I understand that "mixed" is more correct as 16-bit precision can still run some computations in 32-bit.
Motivation
In #9763 I noticed we did not even have a
PrecisionType
for"mixed"
.There's a single test in the codebase passing the "mixed" value:
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/tests/plugins/test_deepspeed_plugin.py#L153
And no mention at all of this value in the docs.
Pitch
Have one value to set this, whether it is
16
or"mixed"
. Most likely16
since its the one widely used.Otherwise, add tests for passing
"mixed"
Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @justusschock @awaelchli @akihironitta @rohitgr7 @tchaton @Borda @carmocca
The text was updated successfully, but these errors were encountered: