-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Use RMM memory pool by default #12235
Comments
Some thoughts:
|
I don't know what (if anything) this does for us, and perhaps my concerns about the
We can. I'm not opposed to it, but what can the user do with that information? When the GPU is e.g., being shared by multiple processes, it may even be misleading to say that their code is allocating memory outside of RMM's knowledge.
Given that other libraries like CuPy and PyTorch also use a pool by default, I don't know if this is very controversial. I want to say that 99% of users won't ever know or care.
I don't believe so (especially if we choose an initial pool size of 0). You are right that we keep a reference to the previous memory resource around so there's no danger/UB. |
This question came up again today. Personally I tend to be against this sort of implicit behavior and prefer that it always be opt-in, but it seems like many feel that the benefit to users outweighs potential risks (but not all, CC @wence- and @pentschev). That said, increasing cudf import times is a concrete cost that I probably wouldn't be willing to stomach for pytorch, whereas for cupy and numba we are importing them already so the question is more of a philosophical one of whether or not we should have side effects to importing cudf. |
In 23.04, importing RMM will no longer hook up cupy, numba, and pytorch to use the RMM allocators (that was rapidsai/rmm#1221). That was mostly done for import time reduction, but it also has the consequence that importing rmm doesn't modify the stateful allocation environment. Within cudf, since we use cupy and numba allocated data in some places, it makes sense to configure those libraries (when imported) to use the same allocator that cudf uses. For now at least, pytorch is not imported by cudf, so it's not as critical that we configure things. That said, if we do go with pool-based allocation by default, then the likelihood of early (not actually) OOM does go up. My feeling is that we should solve this with a "best practices for interop" guide rather than trying to pin down all the possible libraries that might interoperate on device memory with cudf. In general I tend not to like libraries that configure themselves into too many third-party systems (it always feels more "framework" than "library"). |
There are two things here we are discussing here.
This comes down to choosing perf over potential memory contention from other libraries . In all my interactions with customers when they are just using
Regarding 2, My fear is that most users wont know if this is configurable and I think there is also no clean way for us to inform user when they run into OOMs . That said , we cant probably use the RMM Pool cleanly right now with Pytorch as that can lead to fragmentation. We will have to probably use the Example of what a user will have to configure currently is below:
|
On 2, there's at least a blogpost now. Maybe we can do more to promote that? |
I agree that blog post is what I link so that is very useful indeed. I think having a dedicated place in the cuDF docs for it will be great as a lot of people who use cuDF would probably not know what RMM Pool is (similar to people using Pytorch not knowing about memory management ) so we will need a way to link it from cuDF docs too. |
The first line of the PR description says:
Be careful. "managed memory pool" implies a |
There is a great discussion in this issue so far, and it (implicitly) has mostly referred to a default unmanaged (device-only) pool. The issue of a default memory resource is coming up again in the context of preventing OOM errors for the single workstation, beginner cuDF user, where a default managed memory resource could be a good solution. I'm interested to learn more about the default pool behavior in cupy and pytorch as mentioned above. I'm also interested to learn more about how experienced users and other libraries could automatically be opted-out of a default managed resource, while still providing this feature automatically to the beginner user. Perhaps we could have a hook on the first column allocation that checks if RMM has ever had |
Another concern I have in this issue is the recommendation to start with a pool of 0 bytes. With the current pool implementation this will create a lot of fragmentation. |
See also #14498 |
We should move to using an RMM managed memory pool by default.
This was brought up before in #2676. In response to that issue, we implemented
set_allocator
, #2682, but we chose not to enable the RMM pool by default (likely because we didn't want to monopolize GPU memory away from other libraries).Since then, CuPy, Numba (and soon PyTorch) all can be configured to use RMM, and therefore share the same memory pool as cuDF.
Proposal
Concretely, the proposal is that
import cudf
will:What should the initial and maximum pool size be?
An RMM pool can be configured with an initial and maximum pool size. The pool grows according to an implementation-defined strategy (see here for the current strategy).
What happens if
import cudf
appears in the middle of the program?All this works well if
import cudf
appears at the beginning of the program, i.e., before any device memory is actually allocated by any library). However, if it appears after some device objects have already been allocated, it can lead to early out-of-memory errors. As an example, consider some code that uses both PyTorch and cuDF in the following way:Because PyTorch uses a caching allocator, a memory pool already exists by the time we import cuDF. Importing cuDF initializes a second pool that all libraries (including PyTorch) will use going forward. The first pool essentially becomes a kind of dead space: no new device objects are ever allocated within the pool, and no device memory is ever freed from it.
There's no perfect solution I can think of to this particular problem, but it's probably a good idea to call
empty_cache()
before resetting the PyTorch allocator to minimize the amount of wasted memory.That's just one example of the kind of issues that can arise if
import cudf
appears later. I think it's fine to assume this will be less common than importing it at the beginning.The text was updated successfully, but these errors were encountered: