Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU power limit not reported correctly when the limit is enforced out-of-band #445

Closed
amarathe84 opened this issue Jul 11, 2023 · 2 comments · Fixed by #446
Closed

GPU power limit not reported correctly when the limit is enforced out-of-band #445

amarathe84 opened this issue Jul 11, 2023 · 2 comments · Fixed by #446
Assignees

Comments

@amarathe84
Copy link
Collaborator

amarathe84 commented Jul 11, 2023

Description

When an out-of-band interface sets the GPU power limit, Variorum does not report the updated power limit and continues to report the power limit set previously through the in-band interface (either through NVML or through nvidia-smi). This issue was uncovered when testing the node power limit interface on IBM P9 system.

How to reproduce the issue

Apply a lower node power limit (500 W) and check the output of nvidia-smi and the example Variorum code to print GPU power limit.

$ ### Set node power to 500W
$ /bin/echo 500 > /sys/firmware/opal/powercap/system-powercap/powercap-current

$ ### Check output of nvidia-smi
$ nvidia-smi
Mon Jul 10 16:21:29 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000004:04:00.0 Off |                    0 |
| N/A   28C    P0    35W / 100W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000004:05:00.0 Off |                    0 |
| N/A   28C    P0    49W / 100W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000035:03:00.0 Off |                    0 |
| N/A   25C    P0    35W / 100W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000035:04:00.0 Off |                    0 |
| N/A   28C    P0    36W / 100W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

$ ### Power caps for all GPUs were set to 100W
$ ~/myworkspace/varioum-intel-gpu/build-lassen/examples/variorum-print-power-limit-example
_NVIDIA_GPU_POWER_LIMIT Host Socket DeviceID PowerLimit_W
_NVIDIA_GPU_POWER_LIMIT lassen31 0 0 300.000
_NVIDIA_GPU_POWER_LIMIT lassen31 0 1 300.000
_NVIDIA_GPU_POWER_LIMIT lassen31 1 2 300.000
_NVIDIA_GPU_POWER_LIMIT lassen31 1 3 300.000 

As shown above, Variorum reports incorrect out-of-band power cap set by the node power limit interface.

@amarathe84 amarathe84 self-assigned this Jul 11, 2023
@tpatki
Copy link
Member

tpatki commented Jul 11, 2023

@amarathe84 Interesting catch, let me know what you find when you debug.

One thing to note here is that setting /bin/echo 500 > /sys/firmware/opal/powercap/system-powercap/powercap-current sets the node power cap, and does not the GPU power cap, and the IBM firmware will auto distribute GPU power caps based on the node power cap and active job. If you are building for CPU-only (not a GPU build), then we are reading the IBM limits, which will indeed be 300 unless the firmware detects it is consuming more than 500 W in this example.
Between IBM power cap and NVIDIA power cap, the expected behavior is that the IBM power cap will succeed as it is node-level.

When you explicitly set GPU power cap with su nv_powercap -p 100 , what happens?

Couple of things to check here: which Lassen build is this: CPU-only, GPU-only or multi-arch? Can you check all three builds and see the output of (1) node power cap setting (no GPU Power cap is set directly) and (2) direct GPU power cap setting?

@amarathe84
Copy link
Collaborator Author

Sorry I somehow didn't see this comment before submitting the PR for the fix. Here's some explanation on the issue based on my debugging:

It's correct that the node-level power limiting interface (OPAL) sets the GPU power limit out-of-band. We can see its effects on the GPU power limit in the nvidia-smi output noted above. But we don't see the new GPU power limit in the GPU-enabled Variorum output. This behavior isn't related to the type of Variorum build (CPU/GPU/CPU+GPU); it's about whether we're using the correct NVML API since nvidia-smi (which uses NVML internally) is able to capture the out-of-band GPU power limit being set by OPAL. It turns out that another API to query enforced power limit was sneaked into NVML some time ago called nvmlDeviceGetEnforcedPowerLimit(). This API returns the expected GPU power limit enforced both in-band (e.g., through nvidia-smi or NVML) and out-of-band (e.g., via. OPAL).

I'm checking with Nvidia whether the NVML API that Variorum relies on to query GPU power limit is deprecated. The NVML documentation doesn't indicate whether it'll be deprecated or whether it is superseded by nvmlDeviceGetEnforcedPowerLimit().

@tpatki tpatki added this to the Production: v0.8.0 Release milestone Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants