Device and dtype properties #462

SkafteNicki · 2021-08-18T12:01:45Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Fixes #455
Issue describes how the Bootstrapper metric does not work currently on gpu. Trying to fix this made me realize that we do not have a easy way of getting the device and dtype of a metric. This PR implements the logic from the DeviceDtypeMixin class taken from lightning into the core Metric class.
https://github.com/PyTorchLightning/pytorch-lightning/blob/38ceb8943ef9b858abead1fbba43ea9a9b4cd93b/pytorch_lightning/core/mixins/device_dtype_mixin.py

Additional, solve the issue using the new property :]

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

torchmetrics/metric.py

for more information, see https://pre-commit.ci

…g/metrics into device_placement

codecov · 2021-08-18T14:55:38Z

Codecov Report

Merging #462 (eede3fe) into master (94a158c) will decrease coverage by 0%.
The diff coverage is 84%.

@@          Coverage Diff          @@
##           master   #462   +/-   ##
=====================================
- Coverage      96%    96%   -0%     
=====================================
  Files         130    130           
  Lines        4301   4341   +40     
=====================================
+ Hits         4126   4159   +33     
- Misses        175    182    +7

torchmetrics/metric.py

for more information, see https://pre-commit.ci

ananthsub

does this mean metrics used in a lightning module where mixed precision training is used would be converted to use fp16 precision as well? is that always desirable? do people want to compute metrics in fp32 while doing the rest of model training in fp16?

SkafteNicki · 2021-08-19T06:53:45Z

@ananthsub the PR does not actually introduce that kind of change, it has always been the case in TM that if you cast your metric to fp16 the metric states would also be casted (since we have overridden the self._apply method):
https://github.com/PyTorchLightning/metrics/blob/689b2189c6f2aff3968d94d8e5fcfdb85dc5b98a/torchmetrics/metric.py#L414-L441
This PR just makes sure that when half(), cpu(), cuda(), to(...) is called we have some local properties which tracks this.

If people want fp32 metrics when doing mixed training, I am not sure about. I am not sure that it matters for that many to have the extra precision during training. However, when it comes to testing, it is very clear for me that users should be using fp32.

CHANGELOG.md

torchmetrics/metric.py

torchmetrics/text/bert.py

maximsch2 · 2021-08-23T16:55:51Z

On mixed-precision case - I agree with Ananth that this is potentially a concern (especially for metrics with accumulations - fp16 will overflow at ~64k so having a 100k sample dataset and doing fp16 training will break even simple metrics like Accuracy), but not new thing introduced by this diff. Let's file an issue and track it? I'm assuming people will get nan/inf as a result of metric in case of fp16 overflow so at least we are not going to silently screw them up.

Worth it to have an example of how to do mixed-precision metric calculation in the docs once this question comes up though.

tchaton

Great !

Borda · 2021-08-26T07:31:02Z

but not new thing introduced by this diff. Let's file an issue and track it? I'm assuming people will get nan/inf as a result of metric in case of fp16 overflow so at least we are not going to silently screw them up.

yes, pls do so 🐰

Borda · 2021-08-26T07:32:25Z

@SkafteNicki mind checking /resolving the last comments?

ananthsub · 2021-08-27T01:34:54Z

@SkafteNicki we are having a very related discussion about device & dtype properties here: https://docs.google.com/document/d/1xHU7-iQSpp9KJTjI3As2EM0mfNHHr37WZYpDpwLkivA/edit#heading=h.cvihcwdhwas5

Given metrics are nn.Modules, what happens if metrics have parameters which live on different devices or have different dtypes? Then we're at odds with this: pytorch/pytorch#7460 (comment)

This makes metrics a restricted set of modules, which could potentially limit use cases in the future.

leezu · 2021-08-28T22:29:31Z

@ananthsub

does this mean metrics used in a lightning module where mixed precision training is used would be converted to use fp16 precision as well? is that always desirable? do people want to compute metrics in fp32 while doing the rest of model training in fp16?

It's not desirable, but this behavior was already introduced by accident in cda5dbd. I opened #484 for tracking.

* add gpu testing * change super * move to metric + simplify * fix bert * update docs * add typing * changelog * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit b10dba4)

SkafteNicki · 2021-09-02T13:46:14Z

@ananthsub, @maximsch2 please see PR #493 that should fix the problems with auto cast. half, double, float is getting disabled for now and will not change the dtype of the metric states, which should hopefully fix the problems with mixed precision training.

SkafteNicki added 4 commits August 17, 2021 13:31

fix

90f34b5

add gpu testing

ec4c0c5

change super

4eb72c6

move to metric + simplify

27eb218

SkafteNicki added bug / fix Something isn't working enhancement New feature or request labels Aug 18, 2021

SkafteNicki added this to the v0.6 milestone Aug 18, 2021

SkafteNicki requested review from ananyahjha93, Borda, ethanwharris, justusschock, SeanNaren and tchaton as code owners August 18, 2021 12:01

mergify bot added the has conflicts label Aug 18, 2021

justusschock reviewed Aug 18, 2021

View reviewed changes

torchmetrics/metric.py Show resolved Hide resolved

Borda and others added 2 commits August 18, 2021 14:37

Merge branch 'master' into device_placement

fde1590

[pre-commit.ci] auto fixes from pre-commit.com hooks

f0d9d18

for more information, see https://pre-commit.ci

mergify bot removed the has conflicts label Aug 18, 2021

Borda approved these changes Aug 18, 2021

View reviewed changes

SkafteNicki added 2 commits August 18, 2021 16:52

fix bert

a750897

Merge branch 'device_placement' of https://github.com/PyTorchLightnin…

e6b33e7

…g/metrics into device_placement

update docs

ae52dcf

SkafteNicki requested a review from edenlightning as a code owner August 18, 2021 14:57

Borda reviewed Aug 18, 2021

View reviewed changes

torchmetrics/metric.py Outdated Show resolved Hide resolved

torchmetrics/metric.py Outdated Show resolved Hide resolved

SkafteNicki and others added 4 commits August 18, 2021 17:00

add typing

e6e40da

changelog

5ab8a51

docstrings

49869c7

[pre-commit.ci] auto fixes from pre-commit.com hooks

59806d6

for more information, see https://pre-commit.ci

mergify bot added the ready label Aug 18, 2021

ananthsub reviewed Aug 18, 2021

View reviewed changes

Borda requested a review from ananthsub August 19, 2021 08:19

SkafteNicki commented Aug 20, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

SkafteNicki commented Aug 20, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Apply suggestions from code review

f5fdb7d

maximsch2 reviewed Aug 23, 2021

View reviewed changes

torchmetrics/metric.py Show resolved Hide resolved

torchmetrics/metric.py Show resolved Hide resolved

torchmetrics/text/bert.py Show resolved Hide resolved

mergify bot added the has conflicts label Aug 24, 2021

Merge branch 'master' into device_placement

c5a655a

mergify bot removed the has conflicts label Aug 24, 2021

SkafteNicki and others added 2 commits August 24, 2021 14:16

Merge branch 'master' into device_placement

e5a71d3

Merge branch 'master' into device_placement

98a27c5

mergify bot added the has conflicts label Aug 25, 2021

tchaton approved these changes Aug 25, 2021

View reviewed changes

Merge branch 'master' into device_placement

eede3fe

mergify bot removed the has conflicts label Aug 26, 2021

Borda enabled auto-merge (squash) August 26, 2021 07:31

Borda assigned SkafteNicki Aug 26, 2021

Borda merged commit b10dba4 into master Aug 26, 2021

Borda deleted the device_placement branch August 26, 2021 08:17

SkafteNicki mentioned this pull request Aug 28, 2021

torchmetrics 0.4+ broke high-precision metric states #484

Closed

ethanwharris mentioned this pull request Sep 2, 2021

TorchMetrics 0.5.1 no longer works with PL save_hyperparameters #492

Closed

SkafteNicki mentioned this pull request Sep 2, 2021

Fix dtype issues #493

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device and dtype properties #462

Device and dtype properties #462

SkafteNicki commented Aug 18, 2021 •

edited

Loading

codecov bot commented Aug 18, 2021 •

edited

Loading

ananthsub left a comment

SkafteNicki commented Aug 19, 2021

maximsch2 commented Aug 23, 2021

tchaton left a comment

Borda commented Aug 26, 2021

Borda commented Aug 26, 2021

ananthsub commented Aug 27, 2021

leezu commented Aug 28, 2021

SkafteNicki commented Sep 2, 2021

Device and dtype properties #462

Device and dtype properties #462

Conversation

SkafteNicki commented Aug 18, 2021 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

codecov bot commented Aug 18, 2021 • edited Loading

Codecov Report

ananthsub left a comment

Choose a reason for hiding this comment

SkafteNicki commented Aug 19, 2021

maximsch2 commented Aug 23, 2021

tchaton left a comment

Choose a reason for hiding this comment

Borda commented Aug 26, 2021

Borda commented Aug 26, 2021

ananthsub commented Aug 27, 2021

leezu commented Aug 28, 2021

SkafteNicki commented Sep 2, 2021

SkafteNicki commented Aug 18, 2021 •

edited

Loading

codecov bot commented Aug 18, 2021 •

edited

Loading