Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Stream synchronize before deallocating SAM (#1655)
While investigating cuml benchmarks, I found an issue with the current `system_memory_resource` that causes segfault. Roughly it's in code like this: ```cuda void foo(...) { rmm::device_uvector<T> tmp(bufferSize, stream); // launch cuda kernels making use of tmp } ``` When the function returns, the `device_uvector` would go out of scope and get deleted, while the cuda kernel might still be in flight. With `cudaFree`, the CUDA runtime would perform implicit synchronization to make sure the kernel finishes before actually freeing the memory, but with SAM we don't have that guarantee, thus causing use-after-free errors. This is a rather simple fix. In the future we may want to use CUDA events to make this less blocking. Authors: - Rong Ou (https://github.com/rongou) Approvers: - Mark Harris (https://github.com/harrism) - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1655
- Loading branch information