Skip to content

Commit

Permalink
Stream synchronize before deallocating SAM (#1655)
Browse files Browse the repository at this point in the history
While investigating cuml benchmarks, I found an issue with the current `system_memory_resource` that causes segfault. Roughly it's in code like this:
```cuda
void foo(...) {
  rmm::device_uvector<T> tmp(bufferSize, stream);
  // launch cuda kernels making use of tmp
}
```
When the function returns, the `device_uvector` would go out of scope and get deleted, while the cuda kernel might still be in flight. With `cudaFree`, the CUDA runtime would perform implicit synchronization to make sure the kernel finishes before actually freeing the memory, but with SAM we don't have that guarantee, thus causing use-after-free errors.

This is a rather simple fix. In the future we may want to use CUDA events to make this less blocking.

Authors:
  - Rong Ou (https://github.com/rongou)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Lawrence Mitchell (https://github.com/wence-)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1655
  • Loading branch information
rongou authored Aug 26, 2024
1 parent 8adedd0 commit 0a8ae04
Showing 1 changed file with 9 additions and 3 deletions.
12 changes: 9 additions & 3 deletions include/rmm/mr/device/system_memory_resource.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -107,17 +107,23 @@ class system_memory_resource final : public device_memory_resource {
/**
* @brief Deallocate memory pointed to by \p p.
*
* The stream argument is ignored.
* This function synchronizes the stream before deallocating the memory.
*
* @param ptr Pointer to be deallocated
* @param bytes The size in bytes of the allocation. This must be equal to the value of `bytes`
* that was passed to the `allocate` call that returned `ptr`.
* @param stream This argument is ignored
* @param stream The stream in which to order this deallocation
*/
void do_deallocate(void* ptr,
[[maybe_unused]] std::size_t bytes,
[[maybe_unused]] cuda_stream_view stream) override
cuda_stream_view stream) override
{
// With `cudaFree`, the CUDA runtime keeps track of dependent operations and does implicit
// synchronization. However, with SAM, since `free` is immediate, we need to wait for in-flight
// CUDA operations to finish before freeing the memory, to avoid potential use-after-free errors
// or race conditions.
stream.synchronize();

rmm::detail::aligned_host_deallocate(
ptr, bytes, CUDA_ALLOCATION_ALIGNMENT, [](void* ptr) { ::operator delete(ptr); });
}
Expand Down

0 comments on commit 0a8ae04

Please sign in to comment.