-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move external allocators into rmm.allocators module to defer imports #1221
Move external allocators into rmm.allocators module to defer imports #1221
Conversation
Preparation for removing by-default import of non-essential packages.
This removes an indirection through rmm.rmm, to simplify removal of the top-level imports by default.
This will make importing RMM faster if you don't want the hooks for cupy, numba, or pytorch. Before, a sampling trace of `import rmm` with pyinstrument shows: $ pyinstrument -i 0.01 importrmm.py _ ._ __/__ _ _ _ _ _/_ Recorded: 10:19:56 Samples: 67 /_//_/// /_\ / //_// / //_'/ // Duration: 0.839 CPU time: 0.837 / _/ v4.4.0 Program: importrmm.py 0.839 <module> importrmm.py:1 └─ 0.839 <module> rmm/__init__.py:1 ├─ 0.315 <module> rmm/allocators/torch.py:1 │ └─ 0.315 <module> torch/__init__.py:1 │ [96 frames hidden] torch, <built-in>, enum, inspect, tok... ├─ 0.297 <module> rmm/mr.py:1 │ └─ 0.297 <module> rmm/_lib/__init__.py:1 │ ├─ 0.216 <module> numba/__init__.py:1 │ │ [140 frames hidden] numba, abc, <built-in>, importlib, em... │ ├─ 0.040 <module> numba/cuda/__init__.py:1 │ │ [34 frames hidden] numba, asyncio, ssl, <built-in>, re, ... │ ├─ 0.030 __new__ enum.py:180 │ │ [5 frames hidden] enum, <built-in> │ └─ 0.011 [self] None └─ 0.227 <module> rmm/allocators/cupy.py:1 └─ 0.227 <module> cupy/__init__.py:1 [123 frames hidden] cupy, pytest, _pytest, attr, <built-i... After: $ pyinstrument -i 0.01 importrmm.py _ ._ __/__ _ _ _ _ _/_ Recorded: 10:20:10 Samples: 28 /_//_/// /_\ / //_// / //_'/ // Duration: 0.297 CPU time: 0.297 / _/ v4.4.0 Program: importrmm.py 0.296 <module> importrmm.py:1 └─ 0.296 <module> rmm/__init__.py:1 └─ 0.296 <module> rmm/mr.py:1 └─ 0.296 <module> rmm/_lib/__init__.py:1 ├─ 0.216 <module> numba/__init__.py:1 │ [141 frames hidden] numba, <built-in>, importlib, email, ... ├─ 0.040 <module> numba/cuda/__init__.py:1 │ [19 frames hidden] numba, asyncio, ssl, <built-in>, unit... ├─ 0.031 [self] None └─ 0.010 __new__ enum.py:180 [4 frames hidden] enum, <built-in>
After this change, a sampling trace of `import rmm` with pyinstrument shows: $ pyinstrument -i 0.01 importrmm.py _ ._ __/__ _ _ _ _ _/_ Recorded: 10:37:40 Samples: 9 /_//_/// /_\ / //_// / //_'/ // Duration: 0.099 CPU time: 0.099 / _/ v4.4.0 Program: importrmm.py 0.099 <module> importrmm.py:1 └─ 0.099 <module> rmm/__init__.py:1 └─ 0.099 <module> rmm/mr.py:1 └─ 0.099 <module> rmm/_lib/__init__.py:1 ├─ 0.059 <module> numpy/__init__.py:1 │ [31 frames hidden] numpy, re, sre_compile, <built-in>, s... ├─ 0.020 __new__ enum.py:180 │ [2 frames hidden] enum ├─ 0.010 <module> ctypes/__init__.py:1 │ [3 frames hidden] ctypes, <built-in> └─ 0.010 _EnumDict.__setitem__ enum.py:89 [3 frames hidden] enum Prior to this, when we had expunged cupy and pytorch imports from the top-level, but not yet numba, an import would take 0.3, rather than 0.1s.
This is best reviewed commit-by-commit, 172bbb5 just moves code, but doesn't change the public API, e20d071 is preparation for removal of "by-default" imports, 6f204e0 is cleanup (the full |
On thing I noticed when updating the docs is that there's significant duplication between |
At a high level, should we consider solving the problem more directly, using (1) exclusively function-local imports, or (2) using some kind of lazy import solution, e.g., 1 or 2? To be clear, my concern isn't about the API change/breakage (although flat is better than nested*). Rather, it's that the subpackage solution solves the problem somewhat implicitly. The user (or developers) don't know that importing from rmm.allocators.cupy import rmm_cupy_allocator
cupy.cuda.set_allocator(rmm_cupy_allocator) # I want to comment this in or out depending on whether I use cupy Please feel free to push back here; I would also agree that the above is really just a minor inconvenience. *but also namespaces are a honking good idea. |
I don't really like these solutions because they break type-checkers which can't see through this kind of dynamic loading. I agree that some kind of automagic approach that meant one doesn't have to think too hard about where to import things would be preferable, but I'm not fully sure that it's necessary.
The import if
So this example will work fine, I think. In the same way that from rmm import rmm_cupy_allocator
import cupy
cupy.cuda.set_allocator(rmm_cupy_allocator) Works today. Moving the imports to be inside the function is doable, but makes everything marginally slower. A trivial test: import numba.cuda
def outside():
return numba.cuda.runtime.get_version()
def inside():
import numba.cuda
return numba.cuda.runtime.get_version() Suggests that the latter costs an extra 200ns or so. |
In some senses, perhaps a better way to solve this problem would be to move the definition of the allocators into the respective libraries and invert the dependency. So that if you have cupy and rmm installed, you can ask cupy to use rmm as an allocator with
Although that has the disadvantage that you might then want to have all the RMM configuration of memory resources coming through cupy, which sounds like a bad idea. So perhaps scratch that. |
Note that this will need matching PRs in (at least) cudf, but I'll wait on those until we have decided on an approach here. |
Thanks for illustrating your points! I'm convinced that we should go ahead with your approach here :) |
I have a couple of fairly cosmetic comments but I love the overall direction of this change and approve the approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, but I agree that the overall approach is great. It also makes it clearer in the docs that RMM is a fundamental tool for many parts of the Python CUDA ecosystem!
Opened #1225 to track deprecation removal (can someone assign to me and add to 23.06 milestone?). |
No longer a breaking change, since it just introduces deprecation warnings if you use the allocator callbacks from the top-level package. |
The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported.
Done |
/merge |
I don't think I have powers to invoke merging here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small suggestion. @wence- if you are okay with it, I can commit this change and merge.
/merge |
Following rapidsai/rmm#1221, RMM allocators now live under the `rmm.allocators` submodule and using old paths causes warnings to be raised. This change updates to the new submodule and prevents such warnings.
The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported.
The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported. Authors: - Lawrence Mitchell (https://github.com/wence-) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #932
…backs The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported.
Adapt to github.com/rapidsai/rmm/pull/1221 which moves allocator callbacks See merge request pentschev/ucxx!5
The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported. Authors: - Lawrence Mitchell (https://github.com/wence-) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1129
The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Carl Simon Adorf (https://github.com/csadorf) - William Hicks (https://github.com/wphicks) URL: #5249
Follow up to PR ( #1129 ) and PR ( rapidsai/rmm#1221 ) Uses `rmm_cupy_allocator` from `rmm.allocators.cupy` where it has been moved to recently. cc @wence- Authors: - https://github.com/jakirkham Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1138
The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Alex Barghi (https://github.com/alexbarghi-nv) - Rick Ratzel (https://github.com/rlratzel) URL: #3300
Description
RMM provides callbacks to configure third-party libraries to use RMM
for memory allocation.
Previously, these were defined in the top-level package, but that
requires (potentially expensive) import of the package we're providing
a hook for, since typically we must import that package to define the
callback. This makes importing RMM expensive. To avoid this, move the
callbacks into (not imported by default) sub-modules in
rmm.allocators
. So, if we want to configure the CuPy allocator, wenow import
rmm_cupy_allocator
fromrmm.allocators.cupy
, and don'tpay the price of importing pytorch.
This change deprecates the use of the allocator callbacks in the
top-level
rmm
module in favour of explicit imports from the relevantrmm.allocators.XXX
sub-module.Before these changes, a sampling trace of
import rmm
withpyinstrument shows:
That is, almost a full second to import things, most of which is spent
importing pytorch and cupy. These modules are not needed in normal
usage of RMM, so we can defer the imports. Numba is a little bit
trickier, but we can also defer up-front imports, with a final result
that after these changes the same
import rmm
call takes just a tenthof a second:
Closes #1211.
Checklist