-
Notifications
You must be signed in to change notification settings - Fork 549
AMD GPU Support Plan #4127
Comments
To show GPU metrics in Grafana, we need to add these metrics for AMD:
For Alerts, we need to add these metrics:
Will treat |
For Grafana panel: rocm-smi provide all metrics we needed
Following alerts are not support, due to issue ROCm/ROC-smi#60 To collect following metrics, we need to known the process running on each GPU. But rocm-smi only provide the process ids using GPU, not specific which process use which GPU. For following metrics, we can only can get the GPU is used by
Following alert is not support, due to lack of
The feature requirement for AMD
|
Currently, we don't support AMD metrics when using default scheduler. Maybe we can use |
Support AMD GPU in PAI:
Currently we support AMD GPUs in hivedscheduler only.
- [ ] specify amd gpu in protocol when using default scheduler- [ ] support amd metrics (job exporter, vc in rest server) when using default schedulerThe text was updated successfully, but these errors were encountered: