-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Out-of-memory callback resource adaptor #892
[REVIEW] Out-of-memory callback resource adaptor #892
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cython looks fantastic!
Wow, that was fast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also get a test written in C++?
Yeap @rongou we do it here: https://github.com/rapidsai/cudf/blob/branch-21.12/java/src/main/native/src/RmmJni.cpp#L139. We hook at the mr level, and call a JNI function in our case. Here's where we handle an oom: https://github.com/rapidsai/cudf/blob/branch-21.12/java/src/main/native/src/RmmJni.cpp#L212 Our resource had handling for threshold-based OOM (i.e. not real OOM from RMM, but instead a low/high watermark for some preemptive spilling). We are not using the low/high watermark at the moment. I am sure part of that could be refactored into its own memory resource. |
@jrhemstad thanks for the review, I have added a C++ test, renamed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the naming should be more general and not use an acronym. Other than that and a few doc improvements, looks like a great contribution. Thanks!
Co-authored-by: Mark Harris <mharris@nvidia.com>
Thanks for the review @harrism, I think I have addressed all of your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nearly there.
@gpucibot merge |
Use rapidsai/rmm#892 to implement spilling on demand. Requires use of [RMM](https://github.com/rapidsai/rmm) and JIT-unspill enabled. The `device_memory_limit` still works as usual -- when known allocations gets to `device_memory_limit`, Dask-CUDA starts spilling preemptively. However, with this PR it is should be possible to increase `device_memory_limit` significantly since memory spikes will be handled by spilling on demand. Closes #755 Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #756
…or (#898) #892 added `failure_callback_resource_adaptor` which provides the ability to respond to memory allocation failures. However, it was hard-coded to catch (and rethrow) `std::bad_alloc` exceptions. This PR makes the type of exception the adaptor catches a template parameter, to provide greater flexibility. The default exception type is now `rmm::out_of_memory` since we expect this to be the common use case. Also a few changes to fix clang-tidy warnings. Authors: - Mark Harris (https://github.com/harrism) Approvers: - Rong Ou (https://github.com/rongou) - Mads R. B. Kristensen (https://github.com/madsbk) - Jake Hemstad (https://github.com/jrhemstad) URL: #898
This PR implements a new resource adaptor that calls a callback function when an allocation fails. The idea being that the callback function can free up memory (e.g. by spilling) and ask rmm to retry the allocation.
This is motivated by the fairly primitive spilling in Dask. Currently, Dask and Dask-CUDA has no way of handling OOM errors other then restarting tasks or workers. Instead they spill preemptively based on some very conservative memory thresholds. For instance, most workflows in Dask-CUDA starts spilling when half the GPU memory is in use.
This PR makes it possible for projects like Dask and Dask-CUDA to trigger spilling on demand instead of preemptively.
cc. @jrhemstad @shwina