Skip to content

Metrics Implementation Question #1082

@n2cholas

Description

@n2cholas

Thanks for the great library, especially the metrics. I have a few questions to better understand the implementation:

During the update stage, why are the values converted to Python floats instead of keeping them as torch values (e.g. here)? This operation incurs a device->host transfer, so the operation is blocking, right? Wouldn't it be better to keep the metric values as torch values on the GPU so the update is async? Then, they can be converted to python floats in the compute method.

In the distributed case, the values are put back in a tensor before the all-reduce, so why not keep them as tensors to begin with?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions