You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When an out-of-band interface sets the GPU power limit, Variorum does not report the updated power limit and continues to report the power limit set previously through the in-band interface (either through NVML or through nvidia-smi). This issue was uncovered when testing the node power limit interface on IBM P9 system.
How to reproduce the issue
Apply a lower node power limit (500 W) and check the output of nvidia-smi and the example Variorum code to print GPU power limit.
$ ### Set node power to 500W
$ /bin/echo 500 > /sys/firmware/opal/powercap/system-powercap/powercap-current
$ ### Check output of nvidia-smi
$ nvidia-smi
Mon Jul 10 16:21:29 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000004:04:00.0 Off | 0 |
| N/A 28C P0 35W / 100W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000004:05:00.0 Off | 0 |
| N/A 28C P0 49W / 100W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000035:03:00.0 Off | 0 |
| N/A 25C P0 35W / 100W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000035:04:00.0 Off | 0 |
| N/A 28C P0 36W / 100W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$ ### Power caps for all GPUs were set to 100W
$ ~/myworkspace/varioum-intel-gpu/build-lassen/examples/variorum-print-power-limit-example
_NVIDIA_GPU_POWER_LIMIT Host Socket DeviceID PowerLimit_W
_NVIDIA_GPU_POWER_LIMIT lassen31 0 0 300.000
_NVIDIA_GPU_POWER_LIMIT lassen31 0 1 300.000
_NVIDIA_GPU_POWER_LIMIT lassen31 1 2 300.000
_NVIDIA_GPU_POWER_LIMIT lassen31 1 3 300.000
As shown above, Variorum reports incorrect out-of-band power cap set by the node power limit interface.
The text was updated successfully, but these errors were encountered:
@amarathe84 Interesting catch, let me know what you find when you debug.
One thing to note here is that setting /bin/echo 500 > /sys/firmware/opal/powercap/system-powercap/powercap-current sets the node power cap, and does not the GPU power cap, and the IBM firmware will auto distribute GPU power caps based on the node power cap and active job. If you are building for CPU-only (not a GPU build), then we are reading the IBM limits, which will indeed be 300 unless the firmware detects it is consuming more than 500 W in this example.
Between IBM power cap and NVIDIA power cap, the expected behavior is that the IBM power cap will succeed as it is node-level.
When you explicitly set GPU power cap with su nv_powercap -p 100 , what happens?
Couple of things to check here: which Lassen build is this: CPU-only, GPU-only or multi-arch? Can you check all three builds and see the output of (1) node power cap setting (no GPU Power cap is set directly) and (2) direct GPU power cap setting?
Sorry I somehow didn't see this comment before submitting the PR for the fix. Here's some explanation on the issue based on my debugging:
It's correct that the node-level power limiting interface (OPAL) sets the GPU power limit out-of-band. We can see its effects on the GPU power limit in the nvidia-smi output noted above. But we don't see the new GPU power limit in the GPU-enabled Variorum output. This behavior isn't related to the type of Variorum build (CPU/GPU/CPU+GPU); it's about whether we're using the correct NVML API since nvidia-smi (which uses NVML internally) is able to capture the out-of-band GPU power limit being set by OPAL. It turns out that another API to query enforced power limit was sneaked into NVML some time ago called nvmlDeviceGetEnforcedPowerLimit(). This API returns the expected GPU power limit enforced both in-band (e.g., through nvidia-smi or NVML) and out-of-band (e.g., via. OPAL).
I'm checking with Nvidia whether the NVML API that Variorum relies on to query GPU power limit is deprecated. The NVML documentation doesn't indicate whether it'll be deprecated or whether it is superseded by nvmlDeviceGetEnforcedPowerLimit().
Description
When an out-of-band interface sets the GPU power limit, Variorum does not report the updated power limit and continues to report the power limit set previously through the in-band interface (either through NVML or through
nvidia-smi
). This issue was uncovered when testing the node power limit interface on IBM P9 system.How to reproduce the issue
Apply a lower node power limit (500 W) and check the output of
nvidia-smi
and the example Variorum code to print GPU power limit.As shown above, Variorum reports incorrect out-of-band power cap set by the node power limit interface.
The text was updated successfully, but these errors were encountered: