GPU power limit not reported correctly when the limit is enforced out-of-band #445

amarathe84 · 2023-07-11T00:07:03Z

Description

When an out-of-band interface sets the GPU power limit, Variorum does not report the updated power limit and continues to report the power limit set previously through the in-band interface (either through NVML or through nvidia-smi). This issue was uncovered when testing the node power limit interface on IBM P9 system.

How to reproduce the issue

Apply a lower node power limit (500 W) and check the output of nvidia-smi and the example Variorum code to print GPU power limit.

$ ### Set node power to 500W
$ /bin/echo 500 > /sys/firmware/opal/powercap/system-powercap/powercap-current

$ ### Check output of nvidia-smi
$ nvidia-smi
Mon Jul 10 16:21:29 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000004:04:00.0 Off |                    0 |
| N/A   28C    P0    35W / 100W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000004:05:00.0 Off |                    0 |
| N/A   28C    P0    49W / 100W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000035:03:00.0 Off |                    0 |
| N/A   25C    P0    35W / 100W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000035:04:00.0 Off |                    0 |
| N/A   28C    P0    36W / 100W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

$ ### Power caps for all GPUs were set to 100W
$ ~/myworkspace/varioum-intel-gpu/build-lassen/examples/variorum-print-power-limit-example
_NVIDIA_GPU_POWER_LIMIT Host Socket DeviceID PowerLimit_W
_NVIDIA_GPU_POWER_LIMIT lassen31 0 0 300.000
_NVIDIA_GPU_POWER_LIMIT lassen31 0 1 300.000
_NVIDIA_GPU_POWER_LIMIT lassen31 1 2 300.000
_NVIDIA_GPU_POWER_LIMIT lassen31 1 3 300.000

As shown above, Variorum reports incorrect out-of-band power cap set by the node power limit interface.

The text was updated successfully, but these errors were encountered:

tpatki · 2023-07-11T02:25:37Z

@amarathe84 Interesting catch, let me know what you find when you debug.

One thing to note here is that setting /bin/echo 500 > /sys/firmware/opal/powercap/system-powercap/powercap-current sets the node power cap, and does not the GPU power cap, and the IBM firmware will auto distribute GPU power caps based on the node power cap and active job. If you are building for CPU-only (not a GPU build), then we are reading the IBM limits, which will indeed be 300 unless the firmware detects it is consuming more than 500 W in this example.
Between IBM power cap and NVIDIA power cap, the expected behavior is that the IBM power cap will succeed as it is node-level.

When you explicitly set GPU power cap with su nv_powercap -p 100 , what happens?

Couple of things to check here: which Lassen build is this: CPU-only, GPU-only or multi-arch? Can you check all three builds and see the output of (1) node power cap setting (no GPU Power cap is set directly) and (2) direct GPU power cap setting?

amarathe84 · 2023-07-11T18:16:13Z

Sorry I somehow didn't see this comment before submitting the PR for the fix. Here's some explanation on the issue based on my debugging:

It's correct that the node-level power limiting interface (OPAL) sets the GPU power limit out-of-band. We can see its effects on the GPU power limit in the nvidia-smi output noted above. But we don't see the new GPU power limit in the GPU-enabled Variorum output. This behavior isn't related to the type of Variorum build (CPU/GPU/CPU+GPU); it's about whether we're using the correct NVML API since nvidia-smi (which uses NVML internally) is able to capture the out-of-band GPU power limit being set by OPAL. It turns out that another API to query enforced power limit was sneaked into NVML some time ago called nvmlDeviceGetEnforcedPowerLimit(). This API returns the expected GPU power limit enforced both in-band (e.g., through nvidia-smi or NVML) and out-of-band (e.g., via. OPAL).

I'm checking with Nvidia whether the NVML API that Variorum relies on to query GPU power limit is deprecated. The NVML documentation doesn't indicate whether it'll be deprecated or whether it is superseded by nvmlDeviceGetEnforcedPowerLimit().

amarathe84 self-assigned this Jul 11, 2023

amarathe84 mentioned this issue Jul 11, 2023

Report GPU power limit using the correct NVML API #446

Merged

10 tasks

tpatki closed this as completed in #446 Jul 11, 2023

tpatki added this to the Production: v0.8.0 Release milestone Jul 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU power limit not reported correctly when the limit is enforced out-of-band #445

GPU power limit not reported correctly when the limit is enforced out-of-band #445

amarathe84 commented Jul 11, 2023 •

edited

Loading

tpatki commented Jul 11, 2023 •

edited

Loading

amarathe84 commented Jul 11, 2023

GPU power limit not reported correctly when the limit is enforced out-of-band #445

GPU power limit not reported correctly when the limit is enforced out-of-band #445

Comments

amarathe84 commented Jul 11, 2023 • edited Loading

Description

How to reproduce the issue

tpatki commented Jul 11, 2023 • edited Loading

amarathe84 commented Jul 11, 2023

amarathe84 commented Jul 11, 2023 •

edited

Loading

tpatki commented Jul 11, 2023 •

edited

Loading