You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To follow-up on this bug, gpu_usage_map (which is in utils/nn.py) relies on nvidia-smi to get this information, and which may not be available. Which is the case on the Oak Ridge National Laboratory's Frontier supercomputer, which uses AMD GPUs, not Nvidia.
Though the equivalent for that call is rocm-smi, it may be best to rely on a third party package that is OEM agnostic and just returns GPU usage regardless the flavor. I recommend using something like Ricks-Lab GPU Utilities
(https://pypi.org/project/rickslab-gpu-utils/) that can work with both AMD and Nvidia GPUs.
Line 357 in trainer.py
gpu_usage = gpu_usage_map(torch.cuda.current_device())
may result in FileNotFoundError.
Work-around is to just use Try Except block to bypass error.
Windows may view as unsafe command.
The text was updated successfully, but these errors were encountered: