Possible bug in binary classification `calibration_error` #1105

cwognum · 2022-06-21T16:29:26Z

🐛 Bug

In calibration_error(), the accuracies in the binary classification setting are not correctly computed I think. It just returns the targets. I am guessing this should rather return target == preds.round().int() or something similar? Am I missing something?

Code example

import torch
from torchmetrics.functional.classification import calibration_error

preds = torch.tensor([0.01, 0.001, 0.005])  # The raw sigmoid output
targets = torch.tensor([1, 1, 1])
calibration_error(confidences, targets)
# This returns: tensor(0.9947)

The model confidently predicts the wrong class, but is rewarded with a near perfect calibration score.

Environment

TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source):
- Version 0.9.1
- Installed with mamba
Python & PyTorch Version (e.g., 1.0):
- Python: 3.9.13
- PyTorch: 1.11.0.post202
Any other relevant information such as OS (e.g., Linux):
- I am on Ubuntu, Linux.

The text was updated successfully, but these errors were encountered:

cwognum · 2022-06-22T14:26:25Z

Added a little example to better illustrate my point.

By the way, using just a 0-vector would have been a simpler example, but it turns out the preds can't be 0 exactly due to how the binning is done. It could make sense to clamp the predictions in the binning process to prevent this. E.g.:

torch.clip(confidences, 1e-6, 1.0)

SkafteNicki · 2022-08-30T13:21:42Z

Hi,
I checked this issue as an bigger refactor (see this issue #1001 and this PR #1195) and it seems that our calibration error is computing the right value.

First, in the example provided the metric is giving a score of 0.9942. As the metric is an calibration error the optimum would be 0 and not 1 and it therefore seems correct that the metric is giving a high score as the example is clearly not well calibrated.

Secondly I ran the example through an third party package https://github.com/fabiankueppers/calibration-framework which gives the same result as our implementation (we are actually using it for testing now).

Therefore, there does not seem to be an error in the implementation.
Closing issue.

eyuansu62 · 2022-12-29T10:23:07Z

I also have the same problem.
def _binary_calibration_error_update(preds: Tensor, target: Tensor) -> Tensor:
confidences, accuracies = preds, target
return confidences, accuracies

How could the target be possible equal to accuracy?
The target is ground truth label and accuracy is preds == target.

cwognum added bug / fix Something isn't working help wanted Extra attention is needed labels Jun 21, 2022

Borda assigned SkafteNicki Jul 19, 2022

Borda added this to the v0.10 milestone Jul 27, 2022

SkafteNicki closed this as completed Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in binary classification `calibration_error` #1105

Possible bug in binary classification `calibration_error` #1105

cwognum commented Jun 21, 2022 •

edited

Loading

cwognum commented Jun 22, 2022

SkafteNicki commented Aug 30, 2022

eyuansu62 commented Dec 29, 2022

Possible bug in binary classification calibration_error #1105

Possible bug in binary classification calibration_error #1105

Comments

cwognum commented Jun 21, 2022 • edited Loading

🐛 Bug

Code example

Environment

cwognum commented Jun 22, 2022

SkafteNicki commented Aug 30, 2022

eyuansu62 commented Dec 29, 2022

Possible bug in binary classification `calibration_error` #1105

Possible bug in binary classification `calibration_error` #1105

cwognum commented Jun 21, 2022 •

edited

Loading