Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CUDA 11.2+ features via dlopen #990

Conversation

robertmaynard
Copy link
Contributor

By binding to the cudart 11.2 functions at runtime we remove the requirement that these symbols exist, therefore allowing
RMM to be compiled with 11.2+ and used with 11.0 or 11.1.

@robertmaynard robertmaynard added bug Something isn't working 3 - Ready for review Ready for review by team non-breaking Non-breaking change labels Mar 9, 2022
@robertmaynard robertmaynard requested review from a team as code owners March 9, 2022 21:26
@robertmaynard robertmaynard requested review from rongou and cwharris March 9, 2022 21:26
@github-actions github-actions bot added CMake cpp Pertains to C++ code labels Mar 9, 2022
By binding to the cudart 11.2 functions at runtime we remove
the requirement that these symbols exist, therefore allowing
RMM to be compiled with 11.2+ and used with 11.0 or 11.1.
@robertmaynard robertmaynard force-pushed the cuda_async_memory_resource-dlopen-cudart branch from 74372fa to 8944f92 Compare March 9, 2022 21:27
@robertmaynard robertmaynard requested a review from jrhemstad March 10, 2022 21:01
@leofang
Copy link
Member

leofang commented Mar 14, 2022

Just saying, cudaGetDriverEntryPoint is a runtime API that serves the similar purpose, available since CUDA 11.0. The only downside is it does not look up and return the runtime APIs; instead, it returns driver APIs. But the stream-ordered memory allocator is available at the driver level, so I suppose it's also a solution.

@jrhemstad
Copy link
Contributor

Just saying, cudaGetDriverEntryPoint is a runtime API that serves the similar purpose, available since CUDA 11.0. The only downside is it does not look up and return the runtime APIs; instead, it returns driver APIs. But the stream-ordered memory allocator is available at the driver level, so I suppose it's also a solution.

That was added in cudart 11.3, so it would have the same problem we already have with cudaMallocAsync :)

Plus, using the driver API adds an annoying element of needing to ensure a context is created.

@ajschmidt8
Copy link
Member

This is the patch that's necessary to revert both has_cma/no_cma rmm packages from being built. Not sure if this should be included in this PR or a separate one.

Patch
From e0d812392e682c455652a6347656eca516338edd Mon Sep 17 00:00:00 2001
From: AJ Schmidt <aschmidt@nvidia.com>
Date: Wed, 16 Mar 2022 09:53:50 -0400
Subject: [PATCH] Remove `no_cma`/`has_cma` variants

Depends on #990.

Once #990 is merged, we should no longer need the `no_cma`/`has_cma` `rmm` package variants. This change removes the two variants.
---
 conda/recipes/rmm/build.sh                | 7 +------
 conda/recipes/rmm/conda_build_config.yaml | 3 ---
 conda/recipes/rmm/meta.yaml               | 8 ++------
 3 files changed, 3 insertions(+), 15 deletions(-)
 delete mode 100644 conda/recipes/rmm/conda_build_config.yaml

diff --git a/conda/recipes/rmm/build.sh b/conda/recipes/rmm/build.sh
index 08990c36..d2c672e6 100644
--- a/conda/recipes/rmm/build.sh
+++ b/conda/recipes/rmm/build.sh
@@ -1,9 +1,4 @@
 # Copyright (c) 2018-2019, NVIDIA CORPORATION.
 
 # Script assumes the script is executed from the root of the repo directory
-BUILD_FLAGS=""
-if [ "${cudaMallocAsync}" == "no_cma" ]; then
-    BUILD_FLAGS="${BUILD_FLAGS} --no-cudamallocasync"
-fi
-
-./build.sh -v clean rmm ${BUILD_FLAGS}
+./build.sh -v clean rmm
diff --git a/conda/recipes/rmm/conda_build_config.yaml b/conda/recipes/rmm/conda_build_config.yaml
deleted file mode 100644
index fcbbfef8..00000000
--- a/conda/recipes/rmm/conda_build_config.yaml
+++ /dev/null
@@ -1,3 +0,0 @@
-cudaMallocAsync:
-  - has_cma
-  - no_cma
diff --git a/conda/recipes/rmm/meta.yaml b/conda/recipes/rmm/meta.yaml
index 4baa0908..72582846 100644
--- a/conda/recipes/rmm/meta.yaml
+++ b/conda/recipes/rmm/meta.yaml
@@ -14,7 +14,7 @@ source:
 
 build:
   number: {{ GIT_DESCRIBE_NUMBER }}
-  string: cuda{{ cuda_major }}_py{{ py_version }}_{{ GIT_DESCRIBE_HASH }}_{{ GIT_DESCRIBE_NUMBER }}_{{ cudaMallocAsync }}
+  string: cuda{{ cuda_major }}_py{{ py_version }}_{{ GIT_DESCRIBE_HASH }}_{{ GIT_DESCRIBE_NUMBER }}
   script_env:
     - RMM_BUILD_NO_GPU_TEST
     - VERSION_SUFFIX
@@ -37,11 +37,7 @@ requirements:
     - python
     - numba >=0.49
     - numpy
-{% if cudaMallocAsync == "has_cma" %}
-    - {{ pin_compatible('cudatoolkit', max_pin='x', lower_bound='11.2') }} # cudatoolkit >=11.2,<12.0.0
-{% else %}
-    - {{ pin_compatible('cudatoolkit', upper_bound='11.2', lower_bound='11.0') }} # cudatoolkit >=11.0,<11.2
-{% endif %}
+    - {{ pin_compatible('cudatoolkit', max_pin='x', min_pin='x') }}
     - cuda-python >=11.5,<12.0
 
 test:
-- 
2.35.1

ajschmidt8 added a commit to ajschmidt8/rmm that referenced this pull request Mar 16, 2022
Depends on rapidsai#990.

Once rapidsai#990 is merged, we should no longer need the `no_cma`/`has_cma` `rmm` package variants. This PR removes the two variants.
@ajschmidt8
Copy link
Member

This is the patch that's necessary to revert both has_cma/no_cma rmm packages from being built. Not sure if this should be included in this PR or a separate one.

Draft PR that includes these changes is here: #996

@robertmaynard
Copy link
Contributor Author

With #993 being merged I need to update this PR to handle the new logic.

@robertmaynard robertmaynard added 5 - DO NOT MERGE Hold off on merging; see PR for details and removed 3 - Ready for review Ready for review by team labels Mar 16, 2022
@robertmaynard
Copy link
Contributor Author

This is the patch that's necessary to revert both has_cma/no_cma rmm packages from being built. Not sure if this should be included in this PR or a separate one.

Draft PR that includes these changes is here: #996

Personally I am okay with the removal of the cma variants happening in a second PR. This won't break the cma variants it just makes the output identical, and therefore doesn't block this from being merged.

@robertmaynard robertmaynard added 3 - Ready for review Ready for review by team and removed 5 - DO NOT MERGE Hold off on merging; see PR for details labels Mar 17, 2022
ajschmidt8 added a commit to ajschmidt8/rmm that referenced this pull request Mar 17, 2022
Depends on rapidsai#990.

Once rapidsai#990 is merged, we should no longer need the `no_cma`/`has_cma` `rmm` package variants. This PR removes the two variants.
@jrhemstad
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 7434bd6 into rapidsai:branch-22.04 Mar 18, 2022
@robertmaynard robertmaynard deleted the cuda_async_memory_resource-dlopen-cudart branch March 18, 2022 13:14
rapids-bot bot pushed a commit that referenced this pull request Mar 18, 2022
Depends on #990.

Once #990 is merged, we should no longer need the `no_cma`/`has_cma` `rmm` package variants. This PR removes the two variants.

Authors:
  - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)
  - Robert Maynard (https://github.com/robertmaynard)

URL: #996
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for review Ready for review by team bug Something isn't working CMake cpp Pertains to C++ code non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants