Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JaccardIndex mps fails also with PYTORCH_ENABLE_MPS_FALLBACK=1 #1196

Closed
riccardomusmeci opened this issue Aug 26, 2022 · 8 comments · Fixed by #1205
Closed

JaccardIndex mps fails also with PYTORCH_ENABLE_MPS_FALLBACK=1 #1196

riccardomusmeci opened this issue Aug 26, 2022 · 8 comments · Fixed by #1205
Assignees
Labels
bug / fix Something isn't working help wanted Extra attention is needed

Comments

@riccardomusmeci
Copy link

🐛 Bug

JaccardIndex call fails with M1 mps backend since "aten::bincount" is not currently supported. This happens also when PYTORCH_ENABLE_MPS_FALLBACK=1 is set environment variable.

To Reproduce

IoU = JaccardIndex(num_classes=2, ignore_index=0)`
IoU = IoU.to("mps")`
for batch in train_dl:
    x, mask = batch
    x = x.to("mps")
    mask = mask.to("mps")
    logits = model(x)
    preds = torch.sigmoid(logits)
    iou = IoU(preds, mask)

UserWarning: The operator 'aten::bincount' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
return torch.bincount(x, minlength=minlength)

Environment

  • TorchMetrics version (pip): 0.9.3
  • Python & PyTorch Version (e.g., 1.0): 1.12.1
  • Any other relevant information such as OS (e.g., Linux): macOS with M1
@riccardomusmeci riccardomusmeci added bug / fix Something isn't working help wanted Extra attention is needed labels Aug 26, 2022
@github-actions
Copy link

Hi! thanks for your contribution!, great first issue!

@SkafteNicki
Copy link
Member

@justusschock should we do anything to support this explicitly or do we need to wait for support from PyTorch side?

@stancld
Copy link
Contributor

stancld commented Sep 6, 2022

Hi @SkafteNicki, I think as a temporary fix, we may adjust the following part of code:

def _bincount(x: Tensor, minlength: Optional[int] = None) -> Tensor:
    """``torch.bincount`` currently does not support deterministic mode on GPU.
    This implementation fallback to a for-loop counting occurrences in that case.
    Args:
        x: tensor to count
        minlength: minimum length to count
    Returns:
        Number of occurrences for each unique element in x
    """
-    if x.is_cuda and deterministic():
+   if (x.is_cuda and deterministic()) or x.is_mps:
        if minlength is None:
            minlength = len(torch.unique(x))
        output = torch.zeros(minlength, device=x.device, dtype=torch.long)
        for i in range(minlength):
            output[i] = (x == i).sum()
        return output
    else:
        return torch.bincount(x, minlength=minlength)

@stancld
Copy link
Contributor

stancld commented Sep 6, 2022

Also, there are some other operations which are not yet supported on MPS. I can send a short PR fixing those issues and open an issue to check those handlings if they're still needed once torch is updated.

@SkafteNicki
Copy link
Member

@stancld feel free to do so :)

@stancld
Copy link
Contributor

stancld commented Sep 6, 2022

@Borda Do you think it'd be feasible and wise to run our test suites also on an M1 runner?
https://github.blog/changelog/2022-08-09-github-actions-self-hosted-runners-now-support-apple-m1-hardware/

stancld added a commit that referenced this issue Sep 6, 2022
Fixes #1196

* Change `_bincount` calculation for `MPS` to run for loop fallback
* Use torch.where implementation to apply `absent_score` instead of relying on item assignment

There's still a warning on PyTorch side (using CPU fallback for some operations), however,
no actions on our users' side is now required and the results are obtained smoothly.
@stancld stancld self-assigned this Sep 6, 2022
@justusschock
Copy link
Member

@stancld thanks for taking a stab!
Unfortunately we don't have m1 hardware for a self-hosted runner available at the moment.

@SkafteNicki other than the fixes pointed out by @stancld there is nothing we can do besides waiting.

@Borda
Copy link
Member

Borda commented Sep 7, 2022

Correct, we do not have any m1 HW in our CI fleet yet... let us check what we can do here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants