You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
use nvml_wrapper::Nvml;fnmain(){let nvml = Nvml::init().unwrap();let device = nvml.device_by_index(0).unwrap();let st = device.process_utilization_stats(None).unwrap();}
cargo run with error:
thread 'main' panicked at src/main.rs:7:53:called `Result::unwrap()` on an `Err` value: NotFoundnote: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
My device:
Fri Mar 15 07:01:16 2024+-----------------------------------------------------------------------------------------+| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 ||-----------------------------------------+------------------------+----------------------+| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. || | | MIG M. ||=========================================+========================+======================|| 0 NVIDIA GeForce RTX 3080 Ti Off | 00000000:01:00.0 Off | N/A || 0% 43C P8 24W / 350W | 1MiB / 12288MiB | 0% Default || | | N/A |+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=========================================================================================|| No running processes found |+-----------------------------------------------------------------------------------------+
It's quite strange here, the first call to nvmlDeviceGetProcessUtilization to retrieve proccess count returned 79 in my situation which should be 0.
The text was updated successfully, but these errors were encountered:
tubzby
changed the title
process_utilization_stats failed with NOT_FOUND error, Ubuntu 22.04, cuda
process_utilization_stats failed with NOT_FOUND error, Ubuntu 22.04
Mar 15, 2024
The problem persists between restarts of the NVML-using program.
If there is only a single compute process running, and I restart it, then the problem temporarily disappears.
Passing in a timestamp makes it far more likely to break. If I always pass in None, then it'll usually keep working for half a minute or so, polling every 2s.
nvtop doesn't have the issue. What do they do differently?
Feels like a driver bug. Does this happen on every GPU?
Another observation: Processes appear to only be returned if they are running. An idle process doesn't end up in the array, unless it was non-idle very recently. This accounts for what happens if I set the timestamp -- it reduces the horizon.
Also means that swallowing the error (and returning []) should be a valid workaround.
cargo run with error:
My device:
It's quite strange here, the first call to
nvmlDeviceGetProcessUtilization
to retrieve proccess count returned79
in my situation which should be0
.The text was updated successfully, but these errors were encountered: