You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dist_sync_fn is a documented kwarg of class Metric and suggests to pass an alternative to torch.distributed.all_gather. I suppose, the intend is to allow use of distributed contexts other than torch.distributed, for which the default works fine. This use case fails due to the following issues:
passed dist_sync_fn is never called because hardcoded test jit_distributed_available() returns False in any other sync context
this test is assigned as default to kwarg distributed_available of sync() and sync_context(), but neither can be called by the user because all metrics already wrap their compute() method at init
sync_context can be applied only once to a function, else the inner wrapper will raise TorchMetricsUserError "The Metric has already been synced."
TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): 0.10dev
Python & PyTorch Version (e.g., 1.0):
Any other relevant information such as OS (e.g., Linux): e.g. torch_xla.distributed
Additional context
I found that with minimal changes in metrics.py, dist_sync_fn can actually be used to run torchmetrics on TPUs (torch_xla), like in the code sample above. I could send a PR (see fork in colab notebook).
The text was updated successfully, but these errors were encountered:
🐛 Bug
dist_sync_fn
is a documented kwarg of class Metric and suggests to pass an alternative totorch.distributed.all_gather
. I suppose, the intend is to allow use of distributed contexts other thantorch.distributed
, for which the default works fine. This use case fails due to the following issues:dist_sync_fn
is never called because hardcoded testjit_distributed_available()
returns False in any other sync contextdistributed_available
ofsync()
andsync_context()
, but neither can be called by the user because all metrics already wrap theircompute()
method at initsync_context
can be applied only once to a function, else the inner wrapper will raise TorchMetricsUserError "The Metric has already been synced."To Reproduce
Code for testing Accuracy and MetricCollection in a TPU context (colab): https://colab.research.google.com/drive/1MlxWSrkKKuZ3WSb9duf1c0MoO2A8jDAE?usp=sharing
Code sample
Expected behavior
Automatically sync and compute correct metrics.
Environment
conda
,pip
, build from source): 0.10devAdditional context
I found that with minimal changes in metrics.py,
dist_sync_fn
can actually be used to run torchmetrics on TPUs (torch_xla), like in the code sample above. I could send a PR (see fork in colab notebook).The text was updated successfully, but these errors were encountered: