Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recall and Specificity values are the same #1131

Closed
angadkalra opened this issue Jul 7, 2022 · 3 comments · Fixed by #1195
Closed

Recall and Specificity values are the same #1131

angadkalra opened this issue Jul 7, 2022 · 3 comments · Fixed by #1195
Assignees
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Milestone

Comments

@angadkalra
Copy link

angadkalra commented Jul 7, 2022

🐛 Bug

I'm using Recall and Specificity module metrics when training a binary classification model. The graphs in MLFlow are the exact same. Please see code below for how I'm using the module metrics. I'm pretty sure I'm doing it correctly because I'm following the torchmetrics doc as close as I can.

Code sample

def ModelMetrics:
    train_metrics=[
        AUROC(
            num_classes=out_features,
            average='weighted' if out_features > 2 else 'macro',
        ),
    ],
    valid_metrics=[
      AUROC(
          num_classes=out_features,
          average='weighted' if out_features > 2 else 'macro',
      ),
      Specificity(
          num_classes=out_features,
          average='weighted' if out_features > 2 else 'macro',
      ),
      Recall(
          num_classes=out_features,
          average='weighted' if out_features > 2 else 'macro',
      )
    ],
    self.train_metrics: MetricCollection = MetricCollection(train_metrics, prefix='train_')
    self.valid_metrics: MetricCollection = MetricCollection(valid_metrics, prefix='valid_')

def <phase>_step(...):
    if phase == 'train':
        head.train_metrics.update(preds, targets)
    elif phase == 'valid':
        head.valid_metrics.update(preds, targets)

def <phase>_epoch_end(...):
      for head in self.model.model_heads:
            # Use Metrics API
            if phase == 'train':
                for metric_name, metric in head.train_metrics.items():
                    self.log(f'{head.name}_{metric_name}', metric)
            elif phase == 'valid':
                for metric_name, metric in head.valid_metrics.items():
                    self.log(f'{head.name}_{metric_name}', metric)
     return

Expected behavior

Expecting Recall and Specificity to be different values.

Environment

  • TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): pip, 0.9.2
  • Python & PyTorch Version (e.g., 1.0): python 3.8.12, torch 1.12.0+cu116, pytorch-lightning 1.6.4
  • Any other relevant information such as OS (e.g., Linux): AWS EC2 p3.8xlarge on Deep Learning AMI

Additional context

If you guys want any other info please let me know. See below for MLFlow graphs.

Screen Shot 2022-07-07 at 11 08 42 AM

@angadkalra angadkalra added bug / fix Something isn't working help wanted Extra attention is needed labels Jul 7, 2022
@github-actions
Copy link

github-actions bot commented Jul 7, 2022

Hi! thanks for your contribution!, great first issue!

@SkafteNicki SkafteNicki added this to the v0.10 milestone Jul 12, 2022
@justusschock
Copy link
Member

justusschock commented Jul 13, 2022

can you actually give a few sample data (just raw tensor numbers that you think should produce different outputs?

I am asking because I again checked both the formula and the implementation and they seem to be correct.

Ideally you can just generate random number with a fixed seed. But as I said, for me this is not reproducible

@SkafteNicki
Copy link
Member

Issue will be fixed by classification refactor: see this issue #1001 and this PR #1195 for all changes

Small recap: This issue describe that metric Recall and Specificity are all the same in the binary setting, which is wrong. The problem with the current implementation is that the metrics are calculated as average over the 0 and 1 class, which makes all the scores collapse into the same metric essentially.

Using the new binary_* versions of all the metrics:

from torchmetrics.functional import binary_recall, binary_specificity
import torch
preds = torch.rand(10)
target = torch.randint(0, 2, (10,))
binary_recall(preds, target)  # tensor(0.5000)
binary_specificity(preds, target)  # tensor(0.6250)

which also corresponds to what sklearn is giving. Sorry for the confusion that this have given rise to.
Issue will be closed when #1195 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants