Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add range_query tests to NVML test suite #4879

Merged
merged 4 commits into from
Jun 8, 2021

Conversation

charlesbluca
Copy link
Member

Adds a test similar to test_get_worker_monitor_info() to test_nvml.py for use in Dask-CUDA's gpuCI tests; the idea here is that we can hopefully catch issues like rapidsai/dask-cuda#634 preemptively.

  • Closes #xxxx
  • Tests added / passed
  • Passes black distributed / flake8 distributed / isort distributed

@pentschev
Copy link
Member

We probably need #4866 for this, but Distributed CI likely won't catch this because it doesn't have GPU support yet.

@pentschev
Copy link
Member

As discussed with @charlesbluca earlier, it's probably best to wait until #4873 is merged and then add the following check to this test:

if nvml.device_get_count() < 1:
    pytest.skip("No GPUs available")

@pentschev
Copy link
Member

@charlesbluca #4873 is in now, you can now test with that.

@charlesbluca
Copy link
Member Author

Looks like that fixes things - thanks @pentschev! I'll merge upstream and add the check.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @charlesbluca! I noticed all six tests in this module require nvml.device_get_count() >= 1. In order to minimize code duplication could we just skip this entire module if nvml.device_get_count() == 0?

@charlesbluca
Copy link
Member Author

I believe so - I discussed this briefly with @pentschev but wasn't 100% sure if we needed to do a check per-test to ensure that the tests have proper NVML initialization.

@jrbourbeau
Copy link
Member

Fair point. FWIW my impression was that device_get_count handled initialization properly:

def device_get_count():
init_once()
if nvmlLibraryNotFound or not nvmlInitialized:
return 0
else:
return pynvml.nvmlDeviceGetCount()

But I'll let @pentschev confirm whether or not this is the case

@pentschev
Copy link
Member

I just didn't want to risk introducing another issue, but if it works with that change I'm fine with doing that.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @charlesbluca, this is in

@jrbourbeau jrbourbeau merged commit 8d89016 into dask:main Jun 8, 2021
@charlesbluca charlesbluca deleted the add-nvml-range-test branch July 20, 2022 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants