Is `precision="mixed"` redundant? #9956

carmocca · 2021-10-15T16:44:48Z

Proposed refactoring or deprecation

Does precision="mixed" act differently to precision=16 in any way?

I understand that "mixed" is more correct as 16-bit precision can still run some computations in 32-bit.

Motivation

In #9763 I noticed we did not even have a PrecisionType for "mixed".

There's a single test in the codebase passing the "mixed" value:

https://github.com/PyTorchLightning/pytorch-lightning/blob/master/tests/plugins/test_deepspeed_plugin.py#L153

And no mention at all of this value in the docs.

Pitch

Have one value to set this, whether it is 16 or "mixed". Most likely 16 since its the one widely used.

Otherwise, add tests for passing "mixed"

Additional context

$ grep -iIrn '"mixed"' pytorch_lightning
pytorch_lightning/plugins/training_type/sharded.py:62:                    is_fp16 = precision in ("mixed", 16)
pytorch_lightning/plugins/training_type/fully_sharded.py:135:            mixed_precision=precision == "mixed",
pytorch_lightning/plugins/training_type/ipu.py:42:        if self.precision in ("mixed", 16):
pytorch_lightning/plugins/training_type/deepspeed.py:405:            dtype = torch.float16 if self.precision in (16, "mixed") else torch.float32
pytorch_lightning/plugins/training_type/deepspeed.py:473:            dtype = torch.float16 if self.precision in (16, "mixed") else torch.float32
pytorch_lightning/plugins/training_type/deepspeed.py:602:        if precision in (16, "mixed"):
pytorch_lightning/plugins/precision/mixed.py:26:    precision: Union[str, int] = "mixed"
pytorch_lightning/plugins/precision/fully_sharded_native_amp.py:26:    precision = "mixed"

$ grep -iIrn '"mixed"' tests
tests/plugins/test_deepspeed_plugin.py:153:@pytest.mark.parametrize("precision", [16, "mixed"])

$ grep -Irn 'mixed' docs | grep 'precision='
# no mention in the docs!

If you enjoy Lightning, check out our other projects! ⚡

_{Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.}

cc @justusschock @awaelchli @akihironitta @rohitgr7 @tchaton @Borda @carmocca

The text was updated successfully, but these errors were encountered:

justusschock · 2021-10-22T12:29:31Z

@carmocca during the first accelerator refactor we have introduced the term mixed (before it was 16 always) because we were actually planning to deprecate the alias of 16 for AMP and have real half precision training (like we have for double precision) with that flag instead.

jzazo · 2021-11-22T09:49:12Z

What's the status of this discussion? I wanted to train a BERT-like model with half precision, where the model is also half precision and not only the inputs. What would be the correct way to train not using mixed precision? Thanks!

justusschock · 2021-11-22T10:12:38Z

That's not yet decided.

For a true half precision training the current way to go would be to use a custom PrecisionPlugin similar to what we do for double precision

stale · 2021-12-27T09:25:26Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

stale · 2022-02-18T09:52:37Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

carmocca · 2022-04-06T14:46:14Z

A user in Slack got confused by this

How do I do mixed precision training (CPU / GPU)? [...] Allowed precision values: ('16', '32', '64', 'bf16', 'mixed') suggest that 16 trains pure 16 -bit and mixed trains amp . However, the documentation states that 16 trains with mixed precision. When calling Trainer(precision='mixed') I get an error message No precision set . So how do I do pure half-precision, mixed and full-precision training?

Do we want to support true half-precision? I think we should.

The problem is the API. Since precision=16 is used for mixed 16bit, we would need a deprecation phase so that precision=16 becomes true half-precision and precision=mixed_16 for mixed half-precision.

The disadvantage of this is that the instinct from users will be to pass precision=16 because many people don't understand the difference between true precision and mixed precision. On the other hand, having precision=16 be mixed half-precision and requiring precision='true_16' would be inconsistent with the behavior of precision=16.

One more option is to add one extra flag precision_type="full"|"mixed" where we remove precision="mixed". We automatically choose the best type given the precision value.

SeanNaren · 2022-04-06T17:05:33Z

Thanks, @carmocca I think this is a great issue to bring up.

I too was confused by the naming convention originally and required me to go through Pytorch Lightning to understand what the flag means.

I believe that changing precision=16 from being AMP is very dangerous, and could lead to many angry users that do not heed the deprecation warning (unless it was incredible loud :P). I love the idea of adding a precision_type of some sort to control whether we'd assume mixed or full. Maybe even just mixed_precision=True | False.

wisecornelius · 2022-04-07T09:54:06Z

My suggestion for the API would be to have arguments

amp_backend as one argument ("apex" vs "native")
precision, which is either [00, 01, 02, 03] for apex or [16, mixed, 32, 64] for native.

In terms of naming for the native backend, I would suggest

Full precision: "full" and 32
Mixed precision: "mixed"
Half precision: "half" and 16
Double precision: "double" and 64
Bfloat precision: "bf16" or "bfloat16"

If there wont be half precision, 16 should be deprecated because it is a misnomer.

akihironitta · 2022-04-07T10:28:14Z

I believe that changing precision=16 from being AMP is very dangerous, and could lead to many angry users that do not heed the deprecation warning (unless it was incredible loud :P). I love the idea of adding a precision_type of some sort to control whether we'd assume mixed or full. Maybe even just mixed_precision=True | False.

I agree with @SeanNaren. Even though users will need to remember to set two flags (precision and mixed_precision) for doing one thing (true16, ...), I don't think it's worth changing it from being amp, just for a backward-compatibility reason as SeanNaren mentioned above.

I also like @carmocca's suggestion for adding a flag mixed_precision=True | False even though that's adding one more thing to remember for users. I think it may be fine if we document it properly so that the new flag is discoverable.

rohitgr7 · 2022-04-07T11:06:43Z

also like @carmocca's suggestion for adding a flag mixed_precision=True | False even though that's adding one more thing to remember for users. I think it may be fine if we document it properly so that the new flag is discoverable.

with another flag, we will end up with 4 flags that determine precision for different configurations.

off-topic: I was thinking if we could deprecate amp_level btw and let users choose a different amp_level by passing Trainer(..., plugins=[ApexMixedPrecisionPlugin(amp_level=...)]).

carmocca · 2022-04-07T11:56:29Z

My suggestion for the API would be to have arguments...

@wisecornelius I think that is very correct and concise but prone to errors from inexperienced users who don't understand all the nuances and differences between backends.

Maybe even just mixed_precision=True | False.

This works now, I just wonder if there'll ever be a new precision algorithm. We would want to deprecate this bool if that ever happens.
If we go with the bool, it should be amp_enabled=True|False for consistency

So we would have precision, amp_backend, amp_level, precision_type/amp_enabled

What I don't really like is that the product of options is very sparse. It does feel boilerplate-y to me but there's no better solution without going through deprecations.

off-topic: I was thinking if we could deprecate amp_level...

This makes sense to me, but you should open a new issue for that.

wisecornelius · 2022-04-07T13:16:46Z

What I don't really like is that the product of options is very sparse. It does feel boilerplate-y to me but there's no better solution without going through deprecations.

@carmocca I agree. I do not like the sparsity. It seems that the strategy repository is currently a project to make the strategy selection using dense arguments. I just learned that amp_level=01 only works when precision=16. I did that wrong for 6 months...

otaj · 2022-04-08T08:09:21Z

I feel like having four different options controlling the precision AND, as @carmocca pointed out, the product of these to be very sparse is not a good approach.

How about having "only" three of them and folding the precision_type/amp_enabled suggested in the last comment into amp_level, where None would indicate something along the lines of amp_enabled=False and anything other would indicate amp_enabled=True?

awaelchli · 2022-04-11T08:31:56Z

Looking at the history, newly added trainer arguments usually don't survive that long 😅

I believe we should take in the learnings from the accelerator/strategy/plugin syntax, where we introduced the concept of registries for combinations of selections under one name.
This helped us push trainer arguments to the individual plugins while still enabling an easy to use interface in code and CLI. Since amp_level and amp_backend only apply to a subset of available plugins, these could be moved to the individual implementations and registered as names for the precision argument (introducing a precision registry).
This solves the sparsity problem as we would only expose the most common combinations as names and makes it easier to discover and understand the Trainer arguments. It also makes it easier to validate the selection since only the registry and the strategy are involved.

# before
Trainer(precision=32|16|64)
# after
Trainer(precision="full"|"mixed"|"double")

# new: true half precision everything
Trainer(precision="half")

# same
Trainer(precision="bf16")

# before
Trainer(precision=16, amp_backend="apex", amp_level="02")
# after
Trainer(precision="apex02")

# for specific customization
Trainer(plugins=ThisSpecialPrecisionPlugin(backend=..., level="X", ...)

These are my random thoughts.

justusschock · 2022-04-11T08:42:44Z

@awaelchli I like that approach! However, this implies that when having precision="mixed" it will always use native amp, right? What do we do if new precisions come up? Do we always refer to them by the fraction of fp32? E.g. 8-bit would be 'quarter'? I think that's not very clear to many people and referring to them by something like precision='16bit' would be more self-explaining

awaelchli · 2022-04-24T23:15:39Z

I think that's not very clear to many people and referring to them by something like precision='16bit' would be more self-explaining

Yes, I like it. In my proposal the main motivation was to use a different name than for the the (int) 16 to be able to introduce true half precision. So yours works too.

carmocca · 2022-08-06T16:53:09Z

Trying to revive this discussion, here's what I find most logical

Flags Before	Precision After
precision=16	"16-mixed"
precision=64	"64-true" (true)
precision=32	"32-true" (true)
precision=bf16	"bf16-mixed"
Not supported	"16-true" (true)
Not supported	"bf16-true" (true)

The old precision values show a warning explaining the historical change

rohitgr7 · 2022-08-06T17:39:53Z

I like it... just not sure precision-type-amp_backend is a good way to get the configuration.

Should we do dict?

{'precision: 16, 'mixed': False, amp_type: 'apex'}

key names can be improved.

also, amp_level is deprecated, so you can remove that, we need to make more changes to this once this is removed.

carmocca · 2022-08-06T18:07:45Z

I prefer the simplicity of the string options, they are consistent with our other registry based trainer arguments.

Also, passing a dict is very similar to passing separate arguments, and has the disadvantages of #9956 (comment)

rohitgr7 · 2022-08-06T19:16:55Z

they are consistent with our other registry based trainer arguments

Well, sort of. They are indicated by the argument name in their respective strategy class ("deepspeed_stage_3"), but here one needs to remember the order (precision-type-amp_backend). That's why I suggested a little more explicit configuration. But if others think it's fine, then 👍

justusschock · 2022-08-08T07:39:35Z

@carmocca do we really want to append "-true" to all non-mixed precision stuff? intuitively, when I see precision="16", I would assume that this is always the case and there would be no need to append "true"

carmocca · 2022-08-09T14:13:09Z

@carmocca do we really want to append "-true" to all non-mixed precision stuff? intuitively, when I see precision="16", I would assume that this is always the case and there would be no need to append "true"

Okay. Updated the table above. However this is less explicit which can be bad for those who don't notice there's a difference between "true" and "mixed"

awaelchli · 2022-09-22T14:59:15Z

Two notes I want to bring up:

I would develop this in parallel with Deprecate the nvidia/apex integration #14416 and release together to minimize friction
I would prefer to leave the "native" postfix out of the picture. I think it is fair to assume that anything, unless explicitly mentioned, is native to what torch does. This has the benefit that the name becomes shorter, but does not prevent us from future integrations with new backends that are not native, as we would postfix the name.

carmocca · 2022-09-22T15:00:22Z

I agree @awaelchli. I believe I wrote the above proposal before we decided to deprecate Apex

awaelchli · 2022-09-22T15:01:39Z

The comment about "native" in the name is also something I'd like to push for removing in the class names in the future. We already have a plugin now that has NativeNative (twice!) in the name, which is beyond ridiculous IMO, considering that most of the precision plugins we have are actually native. Although this is not as big of a user impact compared to these trainer arguments.

carmocca · 2023-01-19T23:46:42Z

The removal of apex has simplified this proposal greatly. I edited my proposed API in #9956 (comment)

carmocca added refactor design Includes a design discussion labels Oct 15, 2021

awaelchli mentioned this issue Oct 22, 2021

Add LightningLite documentation #10043

Merged

12 tasks

carmocca mentioned this issue Oct 22, 2021

Lightning Lite Examples #9987

Merged

23 tasks

carmocca mentioned this issue Nov 22, 2021

Deprecate DeviceType in favor of _AcceleratorType #10503

Merged

12 tasks

stale bot added the won't fix This will not be worked on label Dec 27, 2021

awaelchli added precision: amp Automatic Mixed Precision and removed won't fix This will not be worked on labels Dec 27, 2021

stale bot added the won't fix This will not be worked on label Feb 18, 2022

carmocca added this to the future milestone Feb 18, 2022

stale bot removed the won't fix This will not be worked on label Feb 18, 2022

carmocca mentioned this issue Apr 4, 2022

Add bfloat16 support to DeepSpeedStrategy #12508

Merged

12 tasks

rohitgr7 mentioned this issue Apr 7, 2022

[RFC] Deprecate amp_level in favour of ApexMixedPrecisionPlugin(amp_level=...) #12655

Closed

carmocca mentioned this issue Aug 5, 2022

Fix incorrect precision="mixed" being used with DeepSpeedStrategy and IPUStrategy #14041

Merged

12 tasks

carmocca mentioned this issue Aug 5, 2022

PrecisionPlugin should be the source of truth for the precision, amp_backend, and amp_level #11814

Closed

Borda added this to Frameworks Planning Aug 8, 2022

Borda moved this to In Review in Frameworks Planning Aug 8, 2022

Borda assigned carmocca Aug 8, 2022

Borda moved this from In Review to In Progress in Frameworks Planning Aug 8, 2022

carmocca moved this from In Progress to Todo in Frameworks Planning Aug 20, 2022

awaelchli moved this from Todo to Blocked in Frameworks Planning Sep 1, 2022

justusschock mentioned this issue Sep 22, 2022

Introduce primitives for input/output dtype conversion in Lite Precision #14792

Merged

11 tasks

carmocca mentioned this issue Oct 10, 2022

[Lite] precision_plugin -> precision #15001

Merged

awaelchli mentioned this issue Dec 13, 2022

precision parameter in Trainer does not accept "16" passed as str #16011

Closed

carmocca modified the milestones: future, 2.0 Jan 19, 2023

carmocca assigned justusschock and unassigned carmocca Feb 15, 2023

This was referenced Feb 16, 2023

Introduce new precision layout in fabric #16767

Merged

Introduce new precision layout in PL #16783

Merged

justusschock closed this as completed in #16783 Feb 17, 2023

github-project-automation bot moved this from Blocked to Done in Frameworks Planning Feb 17, 2023

mittagessen mentioned this issue Feb 23, 2023

add --precision option to ketos train and ketos segtrain mittagessen/kraken#453

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is `precision="mixed"` redundant? #9956

Is `precision="mixed"` redundant? #9956

carmocca commented Oct 15, 2021 •

edited by github-actions bot

Loading

justusschock commented Oct 22, 2021

jzazo commented Nov 22, 2021

justusschock commented Nov 22, 2021

stale bot commented Dec 27, 2021

stale bot commented Feb 18, 2022

carmocca commented Apr 6, 2022 •

edited

Loading

SeanNaren commented Apr 6, 2022

wisecornelius commented Apr 7, 2022

akihironitta commented Apr 7, 2022

rohitgr7 commented Apr 7, 2022 •

edited

Loading

carmocca commented Apr 7, 2022

wisecornelius commented Apr 7, 2022

otaj commented Apr 8, 2022

awaelchli commented Apr 11, 2022 •

edited by justusschock

Loading

justusschock commented Apr 11, 2022 •

edited

Loading

awaelchli commented Apr 24, 2022

carmocca commented Aug 6, 2022 •

edited

Loading

rohitgr7 commented Aug 6, 2022

carmocca commented Aug 6, 2022

rohitgr7 commented Aug 6, 2022

justusschock commented Aug 8, 2022

carmocca commented Aug 9, 2022

awaelchli commented Sep 22, 2022 •

edited

Loading

carmocca commented Sep 22, 2022

awaelchli commented Sep 22, 2022 •

edited

Loading

carmocca commented Jan 19, 2023

Is precision="mixed" redundant? #9956

Is precision="mixed" redundant? #9956

Comments

carmocca commented Oct 15, 2021 • edited by github-actions bot Loading

Proposed refactoring or deprecation

Motivation

Pitch

Additional context

If you enjoy Lightning, check out our other projects! ⚡

justusschock commented Oct 22, 2021

jzazo commented Nov 22, 2021

justusschock commented Nov 22, 2021

stale bot commented Dec 27, 2021

stale bot commented Feb 18, 2022

carmocca commented Apr 6, 2022 • edited Loading

SeanNaren commented Apr 6, 2022

wisecornelius commented Apr 7, 2022

akihironitta commented Apr 7, 2022

rohitgr7 commented Apr 7, 2022 • edited Loading

carmocca commented Apr 7, 2022

wisecornelius commented Apr 7, 2022

otaj commented Apr 8, 2022

awaelchli commented Apr 11, 2022 • edited by justusschock Loading

justusschock commented Apr 11, 2022 • edited Loading

awaelchli commented Apr 24, 2022

carmocca commented Aug 6, 2022 • edited Loading

rohitgr7 commented Aug 6, 2022

carmocca commented Aug 6, 2022

rohitgr7 commented Aug 6, 2022

justusschock commented Aug 8, 2022

carmocca commented Aug 9, 2022

awaelchli commented Sep 22, 2022 • edited Loading

carmocca commented Sep 22, 2022

awaelchli commented Sep 22, 2022 • edited Loading

carmocca commented Jan 19, 2023

Is `precision="mixed"` redundant? #9956

Is `precision="mixed"` redundant? #9956

carmocca commented Oct 15, 2021 •

edited by github-actions bot

Loading

carmocca commented Apr 6, 2022 •

edited

Loading

rohitgr7 commented Apr 7, 2022 •

edited

Loading

awaelchli commented Apr 11, 2022 •

edited by justusschock

Loading

justusschock commented Apr 11, 2022 •

edited

Loading

carmocca commented Aug 6, 2022 •

edited

Loading

awaelchli commented Sep 22, 2022 •

edited

Loading

awaelchli commented Sep 22, 2022 •

edited

Loading