GPU utilization plot reports repeated GPUs #4117

mrocklin · 2020-09-19T19:57:20Z

@quasiben and I were playing around with his 2-gpu system, and were surprised to see four GPUs in his dashboard plots. I suspect that each worker was separately reporting the metrics for both GPUs on the system.

The relevant code for this is here:

distributed/distributed/diagnostics/nvml.py

Lines 6 to 12 in ecaf140

    
           def _pynvml_handles(): 
        
               global handles 
        
               if handles is None: 
        
                   pynvml.nvmlInit() 
        
                   count = pynvml.nvmlDeviceGetCount() 
        
                   handles = [pynvml.nvmlDeviceGetHandleByIndex(i) for i in range(count)] 
        
               return handles

Perhaps pynvml does not respect CUDA_VISIBLE_DEVICES? Should we filter this on our own?

cc @rjzamora @jacobtomlinson

The text was updated successfully, but these errors were encountered:

trivialfis · 2020-09-27T13:06:06Z

Hi @mrocklin , I'm using LocalCUDACluster from dask-cuda and can't find the utilization information on dashboard. Is there any document on how to enable it?

quasiben · 2020-09-27T13:23:23Z

@trivialfis you may need to upgrade the dask lab extension if you are using jupyter lab (recommended).

You can also find it directly on the dashboard at :8787/individual-gpu-utilization

trivialfis · 2020-09-27T13:34:26Z

@quasiben Thanks for the reply. The direct way works perfectly for me.

@mrocklin

Perhaps pynvml does not respect CUDA_VISIBLE_DEVICES?

If memory serves, you are right that nvml doesn't respect CUDA_VISIBLE_DEVICES, so pynvml won't be affected by this env.

trivialfis · 2020-09-27T15:10:19Z

Related #3808 .

jacobtomlinson · 2020-10-19T10:47:33Z

I think this has been resolved by #3810

trivialfis mentioned this issue Sep 29, 2020

Limit GPU metrics to visible devices only #3810

Merged

jacobtomlinson closed this as completed Oct 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU utilization plot reports repeated GPUs #4117

GPU utilization plot reports repeated GPUs #4117

mrocklin commented Sep 19, 2020

trivialfis commented Sep 27, 2020 •

edited

Loading

quasiben commented Sep 27, 2020

trivialfis commented Sep 27, 2020

trivialfis commented Sep 27, 2020

jacobtomlinson commented Oct 19, 2020

GPU utilization plot reports repeated GPUs #4117

GPU utilization plot reports repeated GPUs #4117

Comments

mrocklin commented Sep 19, 2020

trivialfis commented Sep 27, 2020 • edited Loading

quasiben commented Sep 27, 2020

trivialfis commented Sep 27, 2020

trivialfis commented Sep 27, 2020

jacobtomlinson commented Oct 19, 2020

trivialfis commented Sep 27, 2020 •

edited

Loading