-
-
Notifications
You must be signed in to change notification settings - Fork 657
Labels
Description
Thanks for the great library, especially the metrics. I have a few questions to better understand the implementation:
During the update stage, why are the values converted to Python floats instead of keeping them as torch values (e.g. here)? This operation incurs a device->host transfer, so the operation is blocking, right? Wouldn't it be better to keep the metric values as torch values on the GPU so the update is async? Then, they can be converted to python floats in the compute method.
In the distributed case, the values are put back in a tensor before the all-reduce, so why not keep them as tensors to begin with?
vfdev-5