Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move external allocators into rmm.allocators module to defer imports #1221

Merged
merged 14 commits into from
Feb 27, 2023

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Feb 23, 2023

Description

RMM provides callbacks to configure third-party libraries to use RMM
for memory allocation.

Previously, these were defined in the top-level package, but that
requires (potentially expensive) import of the package we're providing
a hook for, since typically we must import that package to define the
callback. This makes importing RMM expensive. To avoid this, move the
callbacks into (not imported by default) sub-modules in
rmm.allocators. So, if we want to configure the CuPy allocator, we
now import rmm_cupy_allocator from rmm.allocators.cupy, and don't
pay the price of importing pytorch.

This change deprecates the use of the allocator callbacks in the
top-level rmm module in favour of explicit imports from the relevant
rmm.allocators.XXX sub-module.

Before these changes, a sampling trace of import rmm with
pyinstrument shows:

$ pyinstrument -i 0.01 importrmm.py

  _     ._   __/__   _ _  _  _ _/_   Recorded: 10:19:56  Samples:  67
 /_//_/// /_\ / //_// / //_'/ //     Duration: 0.839     CPU time: 0.837
/   _/                      v4.4.0

Program: importrmm.py

0.839 <module>  importrmm.py:1
└─ 0.839 <module>  rmm/__init__.py:1
   ├─ 0.315 <module>  rmm/allocators/torch.py:1
   │  └─ 0.315 <module>  torch/__init__.py:1
   │        [96 frames hidden]  torch, <built-in>, enum, inspect, tok...
   ├─ 0.297 <module>  rmm/mr.py:1
   │  └─ 0.297 <module>  rmm/_lib/__init__.py:1
   │     ├─ 0.216 <module>  numba/__init__.py:1
   │     │     [140 frames hidden]  numba, abc, <built-in>, importlib, em...
   │     ├─ 0.040 <module>  numba/cuda/__init__.py:1
   │     │     [34 frames hidden]  numba, asyncio, ssl, <built-in>, re, ...
   │     ├─ 0.030 __new__  enum.py:180
   │     │     [5 frames hidden]  enum, <built-in>
   │     └─ 0.011 [self]  None
   └─ 0.227 <module>  rmm/allocators/cupy.py:1
      └─ 0.227 <module>  cupy/__init__.py:1
            [123 frames hidden]  cupy, pytest, _pytest, attr, <built-i...

That is, almost a full second to import things, most of which is spent
importing pytorch and cupy. These modules are not needed in normal
usage of RMM, so we can defer the imports. Numba is a little bit
trickier, but we can also defer up-front imports, with a final result
that after these changes the same import rmm call takes just a tenth
of a second:

$ pyinstrument -i 0.01 importrmm.py

  _     ._   __/__   _ _  _  _ _/_   Recorded: 10:37:40  Samples:  9
 /_//_/// /_\ / //_// / //_'/ //     Duration: 0.099     CPU time: 0.099
/   _/                      v4.4.0

Program: importrmm.py

0.099 <module>  importrmm.py:1
└─ 0.099 <module>  rmm/__init__.py:1
   └─ 0.099 <module>  rmm/mr.py:1
      └─ 0.099 <module>  rmm/_lib/__init__.py:1
         ├─ 0.059 <module>  numpy/__init__.py:1
         │     [31 frames hidden]  numpy, re, sre_compile, <built-in>, s...
         ├─ 0.020 __new__  enum.py:180
         │     [2 frames hidden]  enum
         ├─ 0.010 <module>  ctypes/__init__.py:1
         │     [3 frames hidden]  ctypes, <built-in>
         └─ 0.010 _EnumDict.__setitem__  enum.py:89
               [3 frames hidden]  enum

Closes #1211.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Preparation for removing by-default import of non-essential packages.
This removes an indirection through rmm.rmm, to simplify removal of
the top-level imports by default.
This will make importing RMM faster if you don't want the hooks for
cupy, numba, or pytorch.

Before, a sampling trace of `import rmm` with pyinstrument shows:

    $ pyinstrument -i 0.01 importrmm.py

      _     ._   __/__   _ _  _  _ _/_   Recorded: 10:19:56  Samples:  67
     /_//_/// /_\ / //_// / //_'/ //     Duration: 0.839     CPU time: 0.837
    /   _/                      v4.4.0

    Program: importrmm.py

    0.839 <module>  importrmm.py:1
    └─ 0.839 <module>  rmm/__init__.py:1
       ├─ 0.315 <module>  rmm/allocators/torch.py:1
       │  └─ 0.315 <module>  torch/__init__.py:1
       │        [96 frames hidden]  torch, <built-in>, enum, inspect, tok...
       ├─ 0.297 <module>  rmm/mr.py:1
       │  └─ 0.297 <module>  rmm/_lib/__init__.py:1
       │     ├─ 0.216 <module>  numba/__init__.py:1
       │     │     [140 frames hidden]  numba, abc, <built-in>, importlib, em...
       │     ├─ 0.040 <module>  numba/cuda/__init__.py:1
       │     │     [34 frames hidden]  numba, asyncio, ssl, <built-in>, re, ...
       │     ├─ 0.030 __new__  enum.py:180
       │     │     [5 frames hidden]  enum, <built-in>
       │     └─ 0.011 [self]  None
       └─ 0.227 <module>  rmm/allocators/cupy.py:1
          └─ 0.227 <module>  cupy/__init__.py:1
                [123 frames hidden]  cupy, pytest, _pytest, attr, <built-i...

After:

    $ pyinstrument -i 0.01 importrmm.py

      _     ._   __/__   _ _  _  _ _/_   Recorded: 10:20:10  Samples:  28
     /_//_/// /_\ / //_// / //_'/ //     Duration: 0.297     CPU time: 0.297
    /   _/                      v4.4.0

    Program: importrmm.py

    0.296 <module>  importrmm.py:1
    └─ 0.296 <module>  rmm/__init__.py:1
       └─ 0.296 <module>  rmm/mr.py:1
          └─ 0.296 <module>  rmm/_lib/__init__.py:1
             ├─ 0.216 <module>  numba/__init__.py:1
             │     [141 frames hidden]  numba, <built-in>, importlib, email, ...
             ├─ 0.040 <module>  numba/cuda/__init__.py:1
             │     [19 frames hidden]  numba, asyncio, ssl, <built-in>, unit...
             ├─ 0.031 [self]  None
             └─ 0.010 __new__  enum.py:180
                   [4 frames hidden]  enum, <built-in>
After this change, a sampling trace of `import rmm` with pyinstrument
shows:

    $ pyinstrument -i 0.01 importrmm.py

      _     ._   __/__   _ _  _  _ _/_   Recorded: 10:37:40  Samples:  9
     /_//_/// /_\ / //_// / //_'/ //     Duration: 0.099     CPU time: 0.099
    /   _/                      v4.4.0

    Program: importrmm.py

    0.099 <module>  importrmm.py:1
    └─ 0.099 <module>  rmm/__init__.py:1
       └─ 0.099 <module>  rmm/mr.py:1
          └─ 0.099 <module>  rmm/_lib/__init__.py:1
             ├─ 0.059 <module>  numpy/__init__.py:1
             │     [31 frames hidden]  numpy, re, sre_compile, <built-in>, s...
             ├─ 0.020 __new__  enum.py:180
             │     [2 frames hidden]  enum
             ├─ 0.010 <module>  ctypes/__init__.py:1
             │     [3 frames hidden]  ctypes, <built-in>
             └─ 0.010 _EnumDict.__setitem__  enum.py:89
                   [3 frames hidden]  enum

Prior to this, when we had expunged cupy and pytorch imports from the
top-level, but not yet numba, an import would take 0.3, rather than
0.1s.
@wence- wence- requested a review from a team as a code owner February 23, 2023 12:38
@github-actions github-actions bot added the Python Related to RMM Python API label Feb 23, 2023
@wence-
Copy link
Contributor Author

wence- commented Feb 23, 2023

This is best reviewed commit-by-commit, 172bbb5 just moves code, but doesn't change the public API, e20d071 is preparation for removal of "by-default" imports, 6f204e0 is cleanup (the full rmm package is no longer required in rmm.rmm). 359fcc0 switches off by-default import of allocators, and 7fd885b expunges final top-level imports of numba.

@wence-
Copy link
Contributor Author

wence- commented Feb 23, 2023

On thing I noticed when updating the docs is that there's significant duplication between README.md and the python docs page, such that I had to repeat the changes. This has the potential to get out of date. Is the duplication a historical legacy (README first then docs page)?

@shwina
Copy link
Contributor

shwina commented Feb 23, 2023

At a high level, should we consider solving the problem more directly, using (1) exclusively function-local imports, or (2) using some kind of lazy import solution, e.g., 1 or 2?

To be clear, my concern isn't about the API change/breakage (although flat is better than nested*). Rather, it's that the subpackage solution solves the problem somewhat implicitly. The user (or developers) don't know that importing rmm.allocators.cupy needs CuPy installed. As a user I might (perhaps unreasonably) expect to do this:

from rmm.allocators.cupy import rmm_cupy_allocator

cupy.cuda.set_allocator(rmm_cupy_allocator)  # I want to comment this in or out depending on whether I use cupy

Please feel free to push back here; I would also agree that the above is really just a minor inconvenience.

*but also namespaces are a honking good idea.

@wence-
Copy link
Contributor Author

wence- commented Feb 23, 2023

At a high level, should we consider solving the problem more directly, using (1) exclusively function-local imports, or (2) using some kind of lazy import solution, e.g., 1 or 2?

I don't really like these solutions because they break type-checkers which can't see through this kind of dynamic loading. I agree that some kind of automagic approach that meant one doesn't have to think too hard about where to import things would be preferable, but I'm not fully sure that it's necessary.

To be clear, my concern isn't about the API change/breakage (although flat is better than nested*). Rather, it's that the subpackage solution solves the problem somewhat implicitly. The user (or developers) don't know that importing rmm.allocators.cupy needs CuPy installed. As a user I might (perhaps unreasonably) expect to do this:

The import if cupy is not installed will still work, except that rmm_cupy_allocator will fail when called.

from rmm.allocators.cupy import rmm_cupy_allocator

cupy.cuda.set_allocator(rmm_cupy_allocator)  # I want to comment this in or out depending on whether I use cupy

So this example will work fine, I think. In the same way that

from rmm import rmm_cupy_allocator
import cupy
cupy.cuda.set_allocator(rmm_cupy_allocator)

Works today.

Moving the imports to be inside the function is doable, but makes everything marginally slower. A trivial test:

import numba.cuda
def outside():
    return numba.cuda.runtime.get_version()

def inside():
    import numba.cuda
    return numba.cuda.runtime.get_version()

Suggests that the latter costs an extra 200ns or so.

@wence-
Copy link
Contributor Author

wence- commented Feb 23, 2023

In some senses, perhaps a better way to solve this problem would be to move the definition of the allocators into the respective libraries and invert the dependency. So that if you have cupy and rmm installed, you can ask cupy to use rmm as an allocator with

import cupy
cupy.cuda.set_allocator(cupy.cuda.rmm_allocator)

Although that has the disadvantage that you might then want to have all the RMM configuration of memory resources coming through cupy, which sounds like a bad idea. So perhaps scratch that.

@wence-
Copy link
Contributor Author

wence- commented Feb 23, 2023

Note that this will need matching PRs in (at least) cudf, but I'll wait on those until we have decided on an approach here.

@shwina
Copy link
Contributor

shwina commented Feb 23, 2023

Thanks for illustrating your points! I'm convinced that we should go ahead with your approach here :)

python/rmm/rmm.py Outdated Show resolved Hide resolved
@vyasr
Copy link
Contributor

vyasr commented Feb 23, 2023

I have a couple of fairly cosmetic comments but I love the overall direction of this change and approve the approach.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, but I agree that the overall approach is great. It also makes it clearer in the docs that RMM is a fundamental tool for many parts of the Python CUDA ecosystem!

README.md Outdated Show resolved Hide resolved
python/docs/basics.md Outdated Show resolved Hide resolved
python/rmm/__init__.py Show resolved Hide resolved
python/rmm/allocators/numba.py Outdated Show resolved Hide resolved
@wence- wence- changed the title Defer importing packages for third-party allocation hooks Move external allocators into rmm.allocators module to defer imports Feb 24, 2023
@wence-
Copy link
Contributor Author

wence- commented Feb 24, 2023

Opened #1225 to track deprecation removal (can someone assign to me and add to 23.06 milestone?).

@wence-
Copy link
Contributor Author

wence- commented Feb 24, 2023

No longer a breaking change, since it just introduces deprecation warnings if you use the allocator callbacks from the top-level package.

rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Feb 24, 2023
The allocator callbacks now live in their own submodules (so that RMM
does not, for example, import pytorch unless required) and so must be
explicitly imported.
@vyasr
Copy link
Contributor

vyasr commented Feb 24, 2023

Opened #1225 to track deprecation removal (can someone assign to me and add to 23.06 milestone?).

Done

@vyasr vyasr added 3 - Ready for review Ready for review by team non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Feb 24, 2023
@wence-
Copy link
Contributor Author

wence- commented Feb 27, 2023

/merge

@wence-
Copy link
Contributor Author

wence- commented Feb 27, 2023

I don't think I have powers to invoke merging here.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion. @wence- if you are okay with it, I can commit this change and merge.

python/docs/basics.md Outdated Show resolved Hide resolved
@bdice
Copy link
Contributor

bdice commented Feb 27, 2023

/merge

@rapids-bot rapids-bot bot merged commit 1a75350 into rapidsai:branch-23.04 Feb 27, 2023
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Feb 28, 2023
Following rapidsai/rmm#1221, RMM allocators now
live under the `rmm.allocators` submodule and using old paths causes
warnings to be raised. This change updates to the new submodule and
prevents such warnings.
wence- added a commit to rapidsai/dask-cuda-benchmarks that referenced this pull request Feb 28, 2023
The allocator callbacks now live in their own submodules (so that RMM
does not, for example, import pytorch unless required) and so must be
explicitly imported.
rapids-bot bot pushed a commit to rapidsai/ucx-py that referenced this pull request Feb 28, 2023
The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #932
pentschev pushed a commit to rapidsai/ucx-py that referenced this pull request Feb 28, 2023
…backs

The allocator callbacks now live in their own submodules (so that RMM
does not, for example, import pytorch unless required) and so must be
explicitly imported.
pentschev added a commit to rapidsai/ucx-py that referenced this pull request Feb 28, 2023
Adapt to github.com/rapidsai/rmm/pull/1221 which moves allocator callbacks

See merge request pentschev/ucxx!5
rapids-bot bot pushed a commit to rapidsai/dask-cuda that referenced this pull request Feb 28, 2023
The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1129
@wence- wence- deleted the wence/fix/lazy-imports branch February 28, 2023 17:19
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Mar 6, 2023
The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Carl Simon Adorf (https://github.com/csadorf)
  - William Hicks (https://github.com/wphicks)

URL: #5249
rapids-bot bot pushed a commit to rapidsai/dask-cuda that referenced this pull request Mar 9, 2023
Follow up to PR ( #1129 ) and PR ( rapidsai/rmm#1221 )

Uses `rmm_cupy_allocator` from `rmm.allocators.cupy` where it has been moved to recently.

cc @wence-

Authors:
  - https://github.com/jakirkham

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1138
rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request Apr 5, 2023
The allocator callbacks now live in their own submodules (so that RMM does not, for example, import pytorch unless required) and so must be explicitly imported.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Rick Ratzel (https://github.com/rlratzel)

URL: #3300
wence- added a commit to wence-/rmm that referenced this pull request May 18, 2023
wence- added a commit to wence-/rmm that referenced this pull request May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Related to RMM Python API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] rmm unconditionally imports both torch and cupy if they are in the environment
4 participants