-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pluggable memory manager #1203
Comments
I was thinking about overhauling the memory alllocation/deallocation in GPU Faiss anyways, to be able to better keep track of where memory is going for users, and allow for optional logging. I'll make sure that every allocation goes through the GpuResources object, and there will be one of several different categories. There are broadly two classes of memory allocations in GPU Faiss: permanent and temporary. Permanent allocations are retained for the lifetime of the index, and are ultimately owned by the index. Temporary allocations are made out of a memory stack that GpuResources allocates up front, which falls back to the heap (cudaMalloc) when the stack size is exhausted. These allocations do not live beyond the lifetime of a top level call to a Faiss index (or at least, on the GPU they are ordered with respect to the ordering stream, and once all kernels are done on the stream to which all work is ordered, then that temporary allocation is no longer needed and can be reused or freed. Generally about 1 GB or so of memory should be reserved in this stack to avoid cudaMalloc/Free calls during many search operations. An implementation can then be provided of the GpuResources object, and you can route those memory allocations wherever you want, provided you maintain the lifetimes desired. |
Nice explanation @wickedfoo, would you mind adding it to https://github.com/facebookresearch/faiss/wiki/Faiss-on-the-GPU ? |
No activity, closing. |
I will be working on this in June. |
@wickedfoo any chance you have gotten the opportunity to make progress on this? We have been using the RMM pool allocator and managed memory in order to eliminate all device synchronizations from alloc/dealloc and oversubscribe the memory. We are hoping to integrate the approximate nearest neighbors very soon and plugging in a separate allocator will be even more important since FAISS will retain ownership over that memory. |
Yes, I have started on the diff on my end, sorry I work on many things at FB most of which have nothing to do with Faiss these days. The API looks something like this:
with the functions in GpuResources looking something like this:
so to override the memory allocator, you could extend StandardGpuResources (which provides a default implementation of the rest of GpuResources, providing streams and cuBLAS handles etc.) and override these two functions with your own memory allocator code in C++. |
FYI you can currently allocate index objects using managed memory, though this wouldn't come out of your allocator: https://github.com/facebookresearch/faiss/blob/master/gpu/GpuIndex.h#L30 Not 100% sure that all allocations will go through this, but the major ones would. |
@cjnolet I have finished the diff internally in our FB repo, it is out for review amongst ourselves so hopefully will be in your hands not before too long. There are two allocation functions added to
As an artifact of the pre-C++11 API and SWIG restrictions we had at one time (no where
The most naive implementation would simply call These are probably best overridden by extending As long as the stream synchronization directives are adhered to, you can cache and reuse memory however you want. Something that is convenient as well is that the default implementation now shows where all the memory is going, something like this:
which is device -> (AllocType -> (# allocations, total size in bytes)). |
@wickedfoo the API looks great. Looking forward to the release. |
Running on:
Interface:
We have been enabling consistent use of the RMM (RAPIDS memory manager) across the RAPIDS ecosystem and it would be very useful if we could also plug it into FAISS. We would like to start using the GPU-accelerated approximate methods on cuML's
NearestNeighbors
and this could be a way that we could still use theGpuResources
API but be able to guarantee that it will play nicely with theRMMPoolAllocator
.It would be great if FAISS provided a way to plug in a memory manager/allocator. I don't think there needs to be a dependency on RMM here, just the ability for the underlying memory management to be plugged in.
The text was updated successfully, but these errors were encountered: