Add pynvml compatibility by monkey-patching #143

wookayin · 2022-11-27T05:25:36Z

The purpose of this PR is to relax the strict pynvml requirement (<= 11.495.46) introduced in #107.

pynvml 11.510.69 has broken the backward compatibility by removing nvml.nvmlDeviceGetComputeRunningProcesses_v2 which is replaced by v3 APIs (nvml.nvmlDeviceGetComputeRunningProcesses_v3), but this function does not exist for old nvidia drivers less than 510.39.01.

Therefore we pinned pynvml version at 11.495.46 in gpustat v1.0 (#107), but we actually have to use recent pynvml versions for "latest" or modern NVIDIA drivers. To make compute/graphics process information work correctly when a combination of old nvidia drivers (< 510.39) AND pynvml >= 11.510.69 is used, we need to monkey-patch pynvml functions in our custom manner such that, for instance, when v3 API is introduced, we can simply fallback to v2 APIs to retrieve the process information.

This commit adds unit tests for #107, where legacy and supported nvidia-drivers would behave differently on process-relatd APIs (e.g., nvmlDeviceGetComputeRunningProcesses_v2). Note: As already pointed out in #107, this test (and gpustat's process information) fails with nvidia-ml-py > 11.495.46 breaking the backward compatibility.

pynvml 11.510.69 has broken the backward compatibility by removing `nvml.nvmlDeviceGetComputeRunningProcesses_v2` which is replaced by v3 APIs (`nvml.nvmlDeviceGetComputeRunningProcesses_v3`), but this function does not exist for old nvidia drivers less than 510.39.01. Therefore we pinned pynvml version at 11.495.46 in gpustat v1.0 (#107), but we actually have to use recent pynvml versions for "latest" or modern NVIDIA drivers. To make compute/graphics process information work correctly when a combination of old nvidia drivers (`< 510.39`) AND `pynvml >= 11.510.69` is used, we need to monkey-patch pynvml functions in our custom manner such that, for instance, when v3 API is introduced, we can simply fallback to v2 APIs to retrieve the process information.

nvmlDeviceGetMemoryInfo_v2 was added in driver 510.39.01, but breaking the v1 API with no backward compatibility. A corresponding version of pynvml (11.510.69+) is needed to use the v2 API, in order to get the correct memory usage information in nvidia drivers 510.39 or higher. Fixes #141.

wookayin · 2022-12-01T22:44:50Z

To fix #141, pynvml.nvmlDeviceGetMemoryInfo is also monkey-patched in efa355a.

	NVIDIA >= 510.39 (nvmlMemory_v2)	NVIDIA < 510.39 (nvmlMemory_v1)
`nvidia-ml-py>=11.510`	✅ call `version=nvmlMemory_v2`	⚠️ call v2 API -> Function not found. ✅ Fallback to v1 API -> correct result
`nvidia-ml-py<11.510`	❌ call v1 API, incorrect result (#141). ⚠️ UserWarning printed	✅ call v1 API, correct result

setup.py

wookayin self-assigned this Nov 27, 2022

wookayin added the pynvml label Nov 27, 2022

wookayin added this to the 1.1 milestone Nov 27, 2022

wookayin force-pushed the pynvml branch from 5e486c7 to 9aca004 Compare November 27, 2022 05:49

wookayin force-pushed the pynvml branch from 9aca004 to 56a9dcf Compare November 27, 2022 05:53

wookayin mentioned this pull request Nov 27, 2022

Incorrect memory usage for nvidia driver higher than R510 #141

Closed

Minor tweaks.

3cb7200

wookayin merged commit 647bd34 into master Dec 1, 2022

wookayin deleted the pynvml branch December 1, 2022 22:53

wookayin added a commit that referenced this pull request Dec 1, 2022

Fix the wrong warning condition for #143.

ae42bf5

wookayin added a commit that referenced this pull request Dec 1, 2022

Update Changelog for #141 and #143.

9771159

XuehaiPan reviewed Dec 2, 2022

View reviewed changes

setup.py Show resolved Hide resolved

wookayin mentioned this pull request Feb 8, 2023

Support anaconda's legacy pynvml package #149

Closed

wookayin mentioned this pull request Mar 2, 2023

Use NVIDIA's official pynvml binding #107

Merged

XuehaiPan mentioned this pull request Apr 16, 2023

Add GPU stats features giampaolo/psutil#526

Open

wookayin mentioned this pull request May 17, 2023

module 'pynvml' has no attribute '_nvmlGetFunctionPointer' #153

Closed

jjyao mentioned this pull request Nov 16, 2023

[Dashboard] Remove gpustats dependencies from Ray[default] ray-project/ray#41044

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pynvml compatibility by monkey-patching #143

Add pynvml compatibility by monkey-patching #143

wookayin commented Nov 27, 2022 •

edited

Loading

wookayin commented Dec 1, 2022 •

edited

Loading

Add pynvml compatibility by monkey-patching #143

Add pynvml compatibility by monkey-patching #143

Conversation

wookayin commented Nov 27, 2022 • edited Loading

wookayin commented Dec 1, 2022 • edited Loading

wookayin commented Nov 27, 2022 •

edited

Loading

wookayin commented Dec 1, 2022 •

edited

Loading