Stream synchronize before deallocating SAM #1655

rongou · 2024-08-20T22:57:16Z

Description

While investigating cuml benchmarks, I found an issue with the current system_memory_resource that causes segfault. Roughly it's in code like this:

void foo(...) {
  rmm::device_uvector<T> tmp(bufferSize, stream);
  // launch cuda kernels making use of tmp
}

When the function returns, the device_uvector would go out of scope and get deleted, while the cuda kernel might still be in flight. With cudaFree, the CUDA runtime would perform implicit synchronization to make sure the kernel finishes before actually freeing the memory, but with SAM we don't have that guarantee, thus causing use-after-free errors.

This is a rather simple fix. In the future we may want to use CUDA events to make this less blocking.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Closes #1656

rongou · 2024-08-20T23:02:34Z

This is similar to what we are trying to do in cuPy on the python side: cupy/cupy#8442

harrism · 2024-08-21T00:59:09Z

include/rmm/mr/device/system_memory_resource.hpp

+    // synchronization. However, with SAM, since `free` is immediate, we need to wait for in-flight
+    // CUDA operations to finish before freeing the memory, to avoid potential use-after-free errors
+    // or race conditions.
+    stream.synchronize();


This is now a synchronous deallocation, as in the cuda::mr::memory_resource concept (but not the cuda::mr::async_memory_resource concept). As we continue refactoring toward those concepts, there will be two functions: deallocate_async which takes a stream, and deallocate which does not.

Hmm I think it's the other way around. The async version assumes everything is stream ordered, but since we are dealing with malloc/free which doesn't understand cuda streams, we have to synchronize here. The synchronous version would leave the synchronization to the caller, so we don't need to sync here.

I get what you are saying. I'm talking about the fact that deallocate_async is named "async" but will have to synchronize. So that should be documented for users.

I updated the doc, is there anything else you want me to do here?

Waiting on second approval.

harrism · 2024-08-21T01:01:45Z

Reminder, we ask that every PR be associated with an issue. The PR template says this.

…nize

rongou · 2024-08-21T01:25:38Z

Reminder, we ask that every PR be associated with an issue. The PR template says this.

Done.

harrism · 2024-08-22T01:10:13Z

include/rmm/mr/device/system_memory_resource.hpp

  {
+    // With `cudaFree`, the CUDA runtime keeps track of dependent operations and does implicit


Please mention in the docstring of this function that it synchronizes stream.

…nize

harrism · 2024-08-22T20:35:54Z

Speaking to Vivek, he advised that we not rely on cudaHostFree being synchronised either (though it is). So we should duplicate this PR for some other host memory MRs.

…nize

wence-

This makes sense to me, thanks.

rongou · 2024-08-23T19:37:55Z

Test failure seems to be unrelated, can't reproduce it on my machine.

rongou · 2024-08-26T15:13:53Z

Can this be merged? Thanks!

harrism · 2024-08-26T20:10:09Z

Yeah, I reran the failed test last night. It must have been a flaky GPU?

harrism · 2024-08-26T20:10:24Z

/merge

Stream synchronize before deallocating SAM

2e806e5

rongou requested a review from a team as a code owner August 20, 2024 22:57

rongou requested review from wence- and vyasr August 20, 2024 22:57

rongou self-assigned this Aug 20, 2024

github-actions bot added the cpp Pertains to C++ code label Aug 20, 2024

rongou added bug Something isn't working 3 - Ready for review Ready for review by team non-breaking Non-breaking change labels Aug 20, 2024

rongou requested review from harrism and leofang August 20, 2024 22:58

harrism approved these changes Aug 21, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/branch-24.10' into sam-synchro…

744b83a

…nize

harrism reviewed Aug 22, 2024

View reviewed changes

rongou added 3 commits August 22, 2024 09:14

Merge remote-tracking branch 'upstream/branch-24.10' into sam-synchro…

d4a399c

…nize

update docstring

d22c519

Merge remote-tracking branch 'upstream/branch-24.10' into sam-synchro…

5698f04

…nize

Merge remote-tracking branch 'upstream/branch-24.10' into sam-synchro…

1a61959

…nize

wence- approved these changes Aug 23, 2024

View reviewed changes

vyasr approved these changes Aug 23, 2024

View reviewed changes

rongou removed the request for review from leofang August 26, 2024 15:13

rapids-bot bot merged commit 0a8ae04 into rapidsai:branch-24.10 Aug 26, 2024
50 checks passed

leofang mentioned this pull request Sep 25, 2024

Support system allocated memory cupy/cupy#8442

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream synchronize before deallocating SAM #1655

Stream synchronize before deallocating SAM #1655

rongou commented Aug 20, 2024 •

edited

Loading

rongou commented Aug 20, 2024

harrism Aug 21, 2024

rongou Aug 21, 2024

harrism Aug 22, 2024

rongou Aug 22, 2024

harrism Aug 22, 2024

harrism commented Aug 21, 2024

rongou commented Aug 21, 2024

harrism Aug 22, 2024

rongou Aug 22, 2024

harrism commented Aug 22, 2024

wence- left a comment

rongou commented Aug 23, 2024

rongou commented Aug 26, 2024

harrism commented Aug 26, 2024

harrism commented Aug 26, 2024

		{
		// With `cudaFree`, the CUDA runtime keeps track of dependent operations and does implicit

Stream synchronize before deallocating SAM #1655

Stream synchronize before deallocating SAM #1655

Conversation

rongou commented Aug 20, 2024 • edited Loading

Description

Checklist

rongou commented Aug 20, 2024

harrism Aug 21, 2024

Choose a reason for hiding this comment

rongou Aug 21, 2024

Choose a reason for hiding this comment

harrism Aug 22, 2024

Choose a reason for hiding this comment

rongou Aug 22, 2024

Choose a reason for hiding this comment

harrism Aug 22, 2024

Choose a reason for hiding this comment

harrism commented Aug 21, 2024

rongou commented Aug 21, 2024

harrism Aug 22, 2024

Choose a reason for hiding this comment

rongou Aug 22, 2024

Choose a reason for hiding this comment

harrism commented Aug 22, 2024

wence- left a comment

Choose a reason for hiding this comment

rongou commented Aug 23, 2024

rongou commented Aug 26, 2024

harrism commented Aug 26, 2024

harrism commented Aug 26, 2024

rongou commented Aug 20, 2024 •

edited

Loading