Make IPC handle export optional in cuda_async_memory_resource #1030

harrism · 2022-05-04T10:52:47Z

posix handle export is not supported currently in cudaMemPoolCreate on WSL2. This change makes the default to not export IPC handles (cudaMemHandleTypeNone), and allows setting a different handle type.

In Python, cuda_async_memory_resource takes a new enable_ipc parameter which currently defaults to False, which is a breaking change. Defaulting to False was necessary because we can't check supported handle types on CUDA 11.2, only 11.3 and above. Also, the True path is currently written to only support Posix handles, so WSL2 is not supported. Also, IPC has some overheads on cudaMallocAsync pools which we may want to avoid.

Fixes #1029

harrism · 2022-05-12T05:29:16Z

I ran into a problem with the Pytests on CUDA 11.2 because of this C++ code:

static bool is_export_handle_type_supported(cudaMemAllocationHandleType handle_type)
  {
    int supported_handle_types_bitmask{};
#if CUDART_VERSION >= 11030  // 11.3 introduced cudaDevAttrMemoryPoolSupportedHandleTypes
    cudaDeviceGetAttribute(&supported_handle_types_bitmask,
                           cudaDevAttrMemoryPoolSupportedHandleTypes,
                           rmm::detail::current_device().value());
#endif
    return (supported_handle_types_bitmask & handle_type) == handle_type;
  }

On CUDA 11.2 there is no way to check the supported handle types. So we don't support anything other than cudaMemHandleTypeNone... But in the Python code we default enable_ipc to True which causes it to pass cudaMemHandleTypePosixFileDescriptor. While this handle type is supported on non-WSL2 Linux, this function returns false in this situation, which causes an exception to be thrown.

There are two options: Change the default to enable_ipc = False, or make this function return true for all handle types in CUDA 11.2. I think the latter is undesireable. Will the former cause problems?

python/rmm/_lib/memory_resource.pyx

…on dependency. Cython is not working -- not sure how to do it.

…x-opt-in-async-IPC

harrism · 2022-05-17T02:54:29Z

OK @shwina to avoid having to do the hackery on the Cython side, I added an enum class allocation_handle_type as @robertmaynard suggested, inside of cuda_async_memory_resource. This way we only expose the RMM enum on the public interface and we can have the same public interface on all CUDA versions. But I couldn't figure out how to use this enum in Cython (didn't spend long because I figure you will know instantly how to do it). I checked in my attempt -- can you show me how to fix it? Thanks!

harrism · 2022-05-23T22:01:40Z

@vyasr I think I've addressed all your comments. Thanks everyone for the reviews.

before we merge, I would like someone to comment on whether changing to default disabling IPC export is going to cause problems (e.g. for UCX or Dask).

Thanks!

vyasr

LGTM! There's a minor typo fix that I'll just commit.

include/rmm/mr/device/cuda_async_memory_resource.hpp

vyasr · 2022-05-23T22:15:56Z

@vyasr I think I've addressed all your comments. Thanks everyone for the reviews.

before we merge, I would like someone to comment on whether changing to default disabling IPC export is going to cause problems (e.g. for UCX or Dask).

Thanks!

I'm guessing that this would be a good question for @quasiben or @jakirkham.

jakirkham · 2022-05-24T06:10:38Z

Think @pentschev would know better. So will defer to him

jrhemstad · 2022-05-24T21:15:22Z

include/rmm/detail/dynamic_load_runtime.hpp

+  {
+    int supported_handle_types_bitmask{};
+#if CUDART_VERSION >= 11030  // 11.3 introduced cudaDevAttrMemoryPoolSupportedHandleTypes
+    cudaDeviceGetAttribute(&supported_handle_types_bitmask,


Need to error check return value.

Done, using RMM_CUDA_TRY. At least this way we will throw a rmm::cuda_error exception.

jakirkham

After discussing offline, sounds like we are good to go on the Python side

harrism · 2022-05-25T01:00:38Z

I benchmarked with and without IPC enabled (None vs. PosixFileDescriptor) and see no appreciable difference. So I will try setting the default in Python back to True.

Comparing async_mr to async_mr_ipc (from gbenchmarks/RANDOM_ALLOCATIONS_BENCH)
Benchmark                                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/1000/1                         +0.0059         +0.0059             2             2             2             2
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/1000/4                         -0.0096         -0.0096             2             2             2             2
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/1000/64                        -0.0167         -0.0167             2             2             2             2
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/1000/256                       +0.0124         +0.0149             2             2             2             2
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/1000/1024                      -0.0045         -0.0115             2             2             2             2
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/1000/4096                      -0.0313         -0.0008             2             2             2             2
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/10000/1                        +0.0062         +0.0062            22            22            22            22
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/10000/4                        -0.0024         -0.0024            23            23            23            23
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/10000/64                       -0.0034         -0.0034            24            24            24            24
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/10000/256                      +0.0032         +0.0054            24            24            23            23
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/10000/1024                     -0.0179         -0.0079            23            23            20            20
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/10000/4096                     +0.0133         +0.0160            24            24            18            18
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/100000/1                       -0.0005         -0.0005           270           270           270           270
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/100000/4                       +0.0119         +0.0116           256           259           256           259
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/100000/64                      +0.0093         +0.0093           242           245           242           245
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/100000/256                     -0.0096         -0.0084           244           242           236           234
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/100000/1024                    +0.0082         -0.0163           236           237           212           208
BM_RandomAllocations/cuda_[async_mr vs. async_mr_ipc]/100000/4096                    +0.0143         +0.0075           274           277           193           194

…x-opt-in-async-IPC

harrism · 2022-05-25T01:21:45Z

I changed the Python enable_ipc parameter default back to True. If this passes CI, then we can leave it that way so this is not a breaking change for Python.

harrism · 2022-05-25T03:45:49Z

OK, I ran into the same problem with enable_ipc=True: on CUDA 11.2, this causes it to report false for whether the handle type is supported, causing the constructor to fail. The only fix that I can see is either to default to False, or to hack the Cython to use a different value of the handle type for CUDA 11.2 than all other versions. My Cython-foo is not strong enough to do that myself.

jakirkham · 2022-05-25T04:02:41Z

We could default to None in Python and then use False if it's an option and True if not

harrism · 2022-05-25T20:48:23Z

We could default to None in Python and then use False if it's an option and True if not

I don't understand. True and False are both options. But how do we detect the CUDA runtime version at runtime in Python?

That would still be a breaking change. If we are going to break the API (which I am increasingly thinking is fine to do) then why not just default to False.

pentschev · 2022-05-25T20:57:33Z

I don't understand. True and False are both options. But how do we detect the CUDA runtime version at runtime in Python?

That would still be a breaking change. If we are going to break the API (which I am increasingly thinking is fine to do) then why not just default to False.

Defaulting to None would mean RMM internally figures out what to do based on the CUDA runtime versions, where the user hasn't specified that option. True and False would mean an option specified by the user and RMM should respect that, regardless of any internal checks that are in place.

To find the driver/runtime versions you can use RMM! For example:

https://github.com/rapidsai/dask-cuda/blob/63529e891b52aee6c1bfeeecd0c8ff272628d4d0/dask_cuda/tests/test_dask_cuda_worker.py#L112-L115

shwina · 2022-05-25T21:44:28Z

That would still be a breaking change. If we are going to break the API (which I am increasingly thinking is fine to do) then why not just default to False.

Sync'd offline with Mark and we agreed this is best to do. We don't believe this is widely used enough that it will impact many users.

…or error.

msadang

LGTM

harrism added 2 commits May 4, 2022 12:20

opt-in to IPC handle export type

a2c35e4

Attempt at updating Cython for handle type parameter

2d8d46a

harrism added breaking Breaking change bug Something isn't working labels May 4, 2022

github-actions bot added cpp Pertains to C++ code Python Related to RMM Python API labels May 4, 2022

harrism mentioned this pull request May 4, 2022

[BUG] cuda_async_memory_resource.hpp:77: cudaErrorInvalidValue on WSL2 in Windows10 #1029

Closed

style

9e9101b

harrism and others added 5 commits May 4, 2022 03:59

style

e794608

More Cython struggles

7c06c0b

cimport

3a1a98d

Use our own declaration for cudaMemAllocationHandleType

6974ab1

Fix inverted logic and add parameter docs.

d55c298

harrism marked this pull request as ready for review May 11, 2022 23:48

harrism requested review from a team as code owners May 11, 2022 23:48

harrism requested review from vyasr and bdice May 11, 2022 23:48

Fix optional syntax

040a09d

harrism added 2 commits May 11, 2022 22:29

Fix pre-CUDA 11.2 build.

6986e67

Default enable_ipc to False

f99eba0

harrism commented May 12, 2022

View reviewed changes

python/rmm/_lib/memory_resource.pyx Outdated Show resolved Hide resolved

harrism commented May 12, 2022

View reviewed changes

python/rmm/_lib/memory_resource.pyx Outdated Show resolved Hide resolved

shwina and others added 3 commits May 12, 2022 11:21

Try something hacky...

37e5639

Add our own allocation_handle_type enum so we don't leak a CUDA versi…

5e5633b

…on dependency. Cython is not working -- not sure how to do it.

Merge branch 'fix-opt-in-async-IPC' of github.com:harrism/rmm into fi…

fc3297b

…x-opt-in-async-IPC

Move allocation_handle_type declaration under different namespace

77ba4e3

harrism added 2 commits May 23, 2022 14:56

Document allocation_handle_type

c9e10a6

clang-format

b30bc15

vyasr approved these changes May 23, 2022

View reviewed changes

include/rmm/mr/device/cuda_async_memory_resource.hpp Outdated Show resolved Hide resolved

Update include/rmm/mr/device/cuda_async_memory_resource.hpp

b79bf20

jakirkham requested a review from pentschev May 24, 2022 06:10

jrhemstad reviewed May 24, 2022

View reviewed changes

jakirkham approved these changes May 25, 2022

View reviewed changes

harrism added 3 commits May 24, 2022 18:18

Check for CUDA error when testing handle type support.

bfe7db5

Merge branch 'fix-opt-in-async-IPC' of github.com:harrism/rmm into fi…

94815de

…x-opt-in-async-IPC

Change default enable_ipc back to True

aa38cdc

Default enable_ipc to false, raise error if not supported, and test f…

aeb3fb2

…or error.

harrism added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for review Ready for review by team labels May 26, 2022

msadang approved these changes May 26, 2022

View reviewed changes

msadang merged commit 93a1edf into rapidsai:branch-22.06 May 26, 2022

harrism mentioned this pull request May 26, 2022

[RELEASE] rmm v22.06 #1047

Merged

msadang mentioned this pull request May 26, 2022

Revert "Make IPC handle export optional in cuda_async_memory_resource" #1049

Merged

harrism mentioned this pull request May 26, 2022

Make IPC handle export optional in cuda_async_memory_resource #1051

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make IPC handle export optional in cuda_async_memory_resource #1030

Make IPC handle export optional in cuda_async_memory_resource #1030

harrism commented May 4, 2022 •

edited

Loading

harrism commented May 12, 2022

harrism commented May 17, 2022

harrism commented May 23, 2022 •

edited

Loading

vyasr left a comment

vyasr commented May 23, 2022

jakirkham commented May 24, 2022

jrhemstad May 24, 2022

harrism May 25, 2022

jakirkham left a comment

harrism commented May 25, 2022

harrism commented May 25, 2022

harrism commented May 25, 2022

jakirkham commented May 25, 2022

harrism commented May 25, 2022 •

edited

Loading

pentschev commented May 25, 2022

shwina commented May 25, 2022

msadang left a comment

Make IPC handle export optional in cuda_async_memory_resource #1030

Make IPC handle export optional in cuda_async_memory_resource #1030

Conversation

harrism commented May 4, 2022 • edited Loading

harrism commented May 12, 2022

harrism commented May 17, 2022

harrism commented May 23, 2022 • edited Loading

vyasr left a comment

Choose a reason for hiding this comment

vyasr commented May 23, 2022

jakirkham commented May 24, 2022

jrhemstad May 24, 2022

Choose a reason for hiding this comment

harrism May 25, 2022

Choose a reason for hiding this comment

jakirkham left a comment

Choose a reason for hiding this comment

harrism commented May 25, 2022

harrism commented May 25, 2022

harrism commented May 25, 2022

jakirkham commented May 25, 2022

harrism commented May 25, 2022 • edited Loading

pentschev commented May 25, 2022

shwina commented May 25, 2022

msadang left a comment

Choose a reason for hiding this comment

harrism commented May 4, 2022 •

edited

Loading

harrism commented May 23, 2022 •

edited

Loading

harrism commented May 25, 2022 •

edited

Loading