Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cupy to measure memory leak #777

Merged

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Sep 6, 2024

While adding support for Python 3.12 in #773, we found a problem where GPUtil does not support Python 3.12 (#775).

This PR removes the dependency on GPUtil and replaces it with cupy, which is already a dependency.

Closes #775. Closes #776.

@bdice bdice requested review from a team as code owners September 6, 2024 18:20
@bdice bdice added bug Something isn't working non-breaking Introduces a non-breaking change labels Sep 6, 2024
@bdice bdice self-assigned this Sep 6, 2024
Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm much happier with this change than #776. Thank you!

Copy link
Contributor

@grlee77 grlee77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that using cuda-python is preferable to GPUtil. We can also get this info from CuPy without adding the dependency, though (see suggestion). Let me know what you think.

packages:
- cuda-python>=12.0,<13.0a0
- matrix: {cuda: "11.*"}
packages: &test_cuda_python_cu11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, what does this &test_cuda_python_cu11 notation do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a YAML anchor. We use this to define a list that we re-use in *test_cuda_python_cu11. We need a "fallback" list when the cuda selector is not defined.

Comment on lines 26 to 35
def get_used_gpu_memory_mib():
"""Get the used GPU memory in MiB."""
status, free, total = cuda.cudart.cudaMemGetInfo()
if status != cuda.cudart.cudaError_t.cudaSuccess:
raise RuntimeError("Failed to get GPU memory info.")
memory_used = (total - free) / (2**20)
return memory_used

status, num_gpus = cuda.cudart.cudaGetDeviceCount()
if status != cuda.cudart.cudaError_t.cudaSuccess or num_gpus == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also potentially just use CuPy for this and avoid having to add the cuda-python dependency:

add up top:

import cupy as cp

then can use here:

    def get_used_gpu_memory_mib():
        """Get the used GPU memory in MiB."""
        dev = cp.cuda.Device()
        free, total = dev.mem_info
        memory_used = (total - free) / (2**20)
        return memory_used

    num_gpus = cp.cuda.runtime.getDeviceCount()
    if num_gpus == 0:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I'll implement that and remove the cuda-python dependency.

@bdice bdice changed the title Use cuda-python to measure memory leak Use cupy to measure memory leak Sep 6, 2024
@bdice bdice requested a review from grlee77 September 6, 2024 19:41
Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-approving, awesome that the net result here is -1 dependencies 😁

Copy link
Contributor

@grlee77 grlee77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@gigony
Copy link
Contributor

gigony commented Sep 6, 2024

Thanks @bdice for fixing the issue!

@jameslamb
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit 73e6d93 into rapidsai:branch-24.10 Sep 6, 2024
45 checks passed
@jakirkham jakirkham added this to the v24.10.00 milestone Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] test_read_region_cuda_memleak failing (GPUtil does not support Python 3.12)
5 participants