Add range_query tests to NVML test suite #4879

charlesbluca · 2021-06-03T18:27:38Z

Adds a test similar to test_get_worker_monitor_info() to test_nvml.py for use in Dask-CUDA's gpuCI tests; the idea here is that we can hopefully catch issues like rapidsai/dask-cuda#634 preemptively.

Closes #xxxx
Tests added / passed
Passes black distributed / flake8 distributed / isort distributed

pentschev · 2021-06-03T18:40:36Z

We probably need #4866 for this, but Distributed CI likely won't catch this because it doesn't have GPU support yet.

pentschev · 2021-06-03T22:37:41Z

As discussed with @charlesbluca earlier, it's probably best to wait until #4873 is merged and then add the following check to this test:

if nvml.device_get_count() < 1:
    pytest.skip("No GPUs available")

pentschev · 2021-06-04T14:34:44Z

@charlesbluca #4873 is in now, you can now test with that.

charlesbluca · 2021-06-04T14:41:22Z

Looks like that fixes things - thanks @pentschev! I'll merge upstream and add the check.

…vml-range-test

jrbourbeau

Thanks @charlesbluca! I noticed all six tests in this module require nvml.device_get_count() >= 1. In order to minimize code duplication could we just skip this entire module if nvml.device_get_count() == 0?

charlesbluca · 2021-06-08T02:13:00Z

I believe so - I discussed this briefly with @pentschev but wasn't 100% sure if we needed to do a check per-test to ensure that the tests have proper NVML initialization.

jrbourbeau · 2021-06-08T02:27:13Z

Fair point. FWIW my impression was that device_get_count handled initialization properly:

distributed/distributed/diagnostics/nvml.py

Lines 26 to 31 in 2bdec05

    
           def device_get_count(): 
        
               init_once() 
        
               if nvmlLibraryNotFound or not nvmlInitialized: 
        
                   return 0 
        
               else: 
        
                   return pynvml.nvmlDeviceGetCount()

But I'll let @pentschev confirm whether or not this is the case

pentschev · 2021-06-08T11:57:14Z

I just didn't want to risk introducing another issue, but if it works with that change I'm fine with doing that.

jrbourbeau

Thanks @charlesbluca, this is in

Add range_query tests to NVML test suite

cd5355c

charlesbluca mentioned this pull request Jun 3, 2021

Add scheduler tests to gpuCI rapidsai/dask-cuda#635

Closed

Merge remote-tracking branch 'upstream/main' into add-nvml-range-test

e5f7c5a

charlesbluca and others added 2 commits June 4, 2021 10:41

Add device count check

f59a67a

Merge branch 'main' of https://github.com/dask/distributed into add-n…

c6f1aef

…vml-range-test

jrbourbeau reviewed Jun 8, 2021

View reviewed changes

jrbourbeau approved these changes Jun 8, 2021

View reviewed changes

jrbourbeau merged commit 8d89016 into dask:main Jun 8, 2021

charlesbluca deleted the add-nvml-range-test branch July 20, 2022 03:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add range_query tests to NVML test suite #4879

Add range_query tests to NVML test suite #4879

charlesbluca commented Jun 3, 2021

pentschev commented Jun 3, 2021

pentschev commented Jun 3, 2021

pentschev commented Jun 4, 2021

charlesbluca commented Jun 4, 2021

jrbourbeau left a comment

charlesbluca commented Jun 8, 2021

jrbourbeau commented Jun 8, 2021

pentschev commented Jun 8, 2021

jrbourbeau left a comment

Add range_query tests to NVML test suite #4879

Add range_query tests to NVML test suite #4879

Conversation

charlesbluca commented Jun 3, 2021

pentschev commented Jun 3, 2021

pentschev commented Jun 3, 2021

pentschev commented Jun 4, 2021

charlesbluca commented Jun 4, 2021

jrbourbeau left a comment

Choose a reason for hiding this comment

charlesbluca commented Jun 8, 2021

jrbourbeau commented Jun 8, 2021

pentschev commented Jun 8, 2021

jrbourbeau left a comment

Choose a reason for hiding this comment