-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add range_query tests to NVML test suite #4879
Conversation
We probably need #4866 for this, but Distributed CI likely won't catch this because it doesn't have GPU support yet. |
As discussed with @charlesbluca earlier, it's probably best to wait until #4873 is merged and then add the following check to this test: if nvml.device_get_count() < 1:
pytest.skip("No GPUs available") |
@charlesbluca #4873 is in now, you can now test with that. |
Looks like that fixes things - thanks @pentschev! I'll merge upstream and add the check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @charlesbluca! I noticed all six tests in this module require nvml.device_get_count() >= 1
. In order to minimize code duplication could we just skip this entire module if nvml.device_get_count() == 0
?
I believe so - I discussed this briefly with @pentschev but wasn't 100% sure if we needed to do a check per-test to ensure that the tests have proper NVML initialization. |
Fair point. FWIW my impression was that distributed/distributed/diagnostics/nvml.py Lines 26 to 31 in 2bdec05
But I'll let @pentschev confirm whether or not this is the case |
I just didn't want to risk introducing another issue, but if it works with that change I'm fine with doing that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @charlesbluca, this is in
Adds a test similar to
test_get_worker_monitor_info()
totest_nvml.py
for use in Dask-CUDA's gpuCI tests; the idea here is that we can hopefully catch issues like rapidsai/dask-cuda#634 preemptively.black distributed
/flake8 distributed
/isort distributed