Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Ray auto detect nvidia Gpu with pynvml #41020

Merged
merged 7 commits into from
Nov 13, 2023

Conversation

jonathan-anyscale
Copy link
Contributor

@jonathan-anyscale jonathan-anyscale commented Nov 8, 2023

Why are these changes needed?

We want to vendor nvidia-ml-py to Ray in order to serve several purposes:

  • nvidia_gpu auto detection current implementation didn't work when Ray doesn't have root access (Ray cannot access GPUs under a non-root user (failed access of ray.init() to root-owned /proc/driver/nvidia/gpus) #28064). From several alternatives mentioned in the issue, all of them requires libnvi-ml.so which is nvidia driver C api. So, we use pynvml which is a python wrapper of this libnvi-ml.so without any other dependencies. Additionally, we can remove GPUUtil from auto detection as well.
  • Removing gpustat dependencies on Ray[default] by replacing it with pynvml which gpustat depends on as well (later PR)

Related issue number

#28064

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@mattip
Copy link
Contributor

mattip commented Nov 8, 2023

This seems like the right direction, and will close #35581 and a number of issues around GPU detection.

Additionally, we can remove GPUUtil from auto detection as well.

Will this be in a follow-up PR?

python/requirements.txt Outdated Show resolved Hide resolved
python/setup.py Outdated Show resolved Hide resolved
@mattip mattip mentioned this pull request Nov 8, 2023
8 tasks
@jonathan-anyscale
Copy link
Contributor Author

This seems like the right direction, and will close #35581 and a number of issues around GPU detection.

Additionally, we can remove GPUUtil from auto detection as well.

Will this be in a follow-up PR?

The auto detection will be in this PR as well as the change is not too big + part of using nvml library. Will update once I'm done with the changes.

@jonathan-anyscale jonathan-anyscale changed the title [Core][Dependencies] pynvml to Ray Minimal [Core][Dependencies] Ray auto detect nvidia Gpu with pynvml Nov 8, 2023
@jonathan-anyscale jonathan-anyscale changed the title [Core][Dependencies] Ray auto detect nvidia Gpu with pynvml [Core] Ray auto detect nvidia Gpu with pynvml Nov 8, 2023
Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>
Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>
Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>
@jonathan-anyscale jonathan-anyscale marked this pull request as ready for review November 9, 2023 01:30
Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>
@jonathan-anyscale
Copy link
Contributor Author

As per #41020 (comment)
We will move mig detection support as follow up PR

Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wookayin @XuehaiPan it'd be great if you can also review our usage of pynvml since you are experts here :)

python/ray/_private/accelerators/nvidia_gpu.py Outdated Show resolved Hide resolved
Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>
Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>
@jjyao jjyao merged commit 5443a12 into ray-project:master Nov 13, 2023
2 checks passed
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Nov 29, 2023
Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants