[Feature proposal] Support RAPIDS Memory Manager (RMM) #5861

daxiongshu · 2020-07-06T17:46:58Z

RAPIDS is a GPU based data analytics framework developed by NVIDIA. A typical all-GPU workflow is to use RAPIDS cudf to read data and create features, and then use xgboost GPU to train. The two libraries end up competing for memory on the GPU and could lead to OOM since device memory is often limited on GPU.

To mitigate this issue, I propose that xgboost supports RMM: RAPIDS Memory Manager as an alternative memory allocator and share memory pool with other RAPIDS libraries.

hcho3 · 2020-07-06T17:59:05Z

To my knowledge, XGBoost uses thrust::device_malloc_allocator and cub::CachingDeviceAllocator to allocate GPU memory:

xgboost/src/common/device_helpers.cuh

Lines 397 to 405 in b77e3e3

    
           // Declare xgboost allocators 
        
           // Replacement of allocator with custom backend should occur here 
        
           template <typename T> 
        
           using XGBDeviceAllocator = detail::XGBDefaultDeviceAllocatorImpl<T>; 
        
           /*! Be careful that the initialization constructor is a no-op, which means calling 
        
            *  `vec.resize(n)` won't initialize the memory region to 0. Instead use 
        
            * `vec.resize(n, 0)`*/ 
        
           template <typename T> 
        
           using XGBCachingDeviceAllocator = detail::XGBCachingDeviceAllocatorImpl<T>;

Can RMM work with Thrust and Cub allocators?

jrhemstad · 2020-07-06T18:22:31Z

To my knowledge, XGBoost uses thrust::device_malloc_allocator and cub::CachingDeviceAllocator to allocate GPU memory:

What @daxiongshu is suggesting is allowing configuring/customizing XGBoost to use something other than hardcoding it to use the thrust::device_malloc_allocator and cub::CachingDeviceAllocator and instead allow passing in a different allocator, such as one of RMM's sub-allocator implementations.

For more detail on RMM's interface, see https://github.com/rapidsai/rmm#using-rmm-in-c.

hcho3 · 2020-07-06T18:36:49Z

@jrhemstad Thanks for clarifying. Does RMM provide a replacement with a semantic similar to the cached allocator cub::CachingDeviceAllocator? This page describes the caching semantics, and I'm not sure about the impact of replacing the caching allocator with RMM's sub-allocator.

EDIT. I just found rapidsai/rmm#290. RMM doesn't yet provide an equivalent of cub::CachingDeviceAllocator.

hcho3 · 2020-07-06T18:55:00Z

@trivialfis @RAMitchell Can the use of cub::CachingDeviceAllocator safely replaced with another allocator without caching semantics? Do you know why the caching allocator was chosen in the XGBoost codebase?

trivialfis · 2020-07-06T18:59:03Z

I don't think we need to replace caching allocator. thrust allocator on the other hand might be worth thinking about.

hcho3 · 2020-07-06T19:05:03Z

@trivialfis

I don't think we need to replace caching allocator.

Please elaborate why. Does the caching allocator use a small amount of GPU memory?

trivialfis · 2020-07-06T19:08:44Z

Yup. The allocator used for host device vector is thrust allocator.

hcho3 · 2020-07-06T19:14:26Z

How about the DMatrix? Does it also use the Thrust allocator? Then we can let users use RMM instead of the Thrust allocator and potentially achieve a memory saving.

trivialfis · 2020-07-06T19:43:37Z

Yup, DMatrix is backed by thrust allocator. Not sure about ellpack, will check tomorrow.

jrhemstad · 2020-07-06T19:53:24Z

@jrhemstad Thanks for clarifying. Does RMM provide a replacement with a semantic similar to the cached allocator cub::CachingDeviceAllocator? This page describes the caching semantics, and I'm not sure about the impact of replacing the caching allocator with RMM's sub-allocator.

EDIT. I just found rapidsai/rmm#290. RMM doesn't yet provide an equivalent of cub::CachingDeviceAllocator.

Correct, we don't yet have a wrapper for CUB's caching allocator. As you found, I worked on this in rapidsai/rmm#290, but I quickly saw it was much slower than our pool allocator, so I didn't pursue it further.

For example, here are results from one of our benchmarks of doing 1000 allocations/frees of uniformly random sizes in a range [1MB, Max Size] with a variety of maximum sizes. In this particular benchmark, allocations are ~2x more likely than a deallocation, so the peak memory consumption continues to grow until we've reached 1000 allocations, at which point, we free all remaining allocations. The graph shows the cumulative time of all 1000 allocation/frees.

For the CUB caching allocator, I used the same parameters that are used in XGBoost here:

xgboost/src/common/device_helpers.cuh

Line 373 in b77e3e3

    
           static cub::CachingDeviceAllocator *allocator = new cub::CachingDeviceAllocator(2, 9, 29);

As you can see from this (limited) benchmark, the pool implementation is 100-10000x faster than CUB's caching scheme.

Code I used to run this is here: rapidsai/rmm#430

Benchmark code is here: https://github.com/rapidsai/rmm/blob/branch-0.15/benchmarks/random_allocations/random_allocations.cpp

hcho3 · 2020-07-07T03:20:52Z

I think RMM could also give us more tools to manage and track memory usage.

jrhemstad · 2020-07-07T13:04:44Z

I think RMM could also give us more tools to manage and track memory usage.

Yes, there are several other benefits of RMM other than performance. Because it defines a single, common interface for device memory allocation, it allows us to build additional tools around that interface. For example, logging, leak checking, thread safety, benchmarks, etc.

daxiongshu mentioned this issue Jul 6, 2020

[BUG] Unexpected worker restarting when using UCX + RMM + XGB Dask API rapidsai/dask-cuda#337

Closed

This was referenced Jul 8, 2020

[CI] Simplify CMake build with modern CMake techniques #5871

Merged

RMM integration plugin #5873

Merged

jrhemstad mentioned this issue Jul 13, 2020

[WIP] CUB memory resource rapidsai/rmm#430

Closed

jrhemstad mentioned this issue Aug 4, 2020

[FEA] Create device_memory_resource to wrap CUB caching allocator rapidsai/rmm#290

Closed

hcho3 closed this as completed in #5873 Aug 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature proposal] Support RAPIDS Memory Manager (RMM) #5861

[Feature proposal] Support RAPIDS Memory Manager (RMM) #5861

daxiongshu commented Jul 6, 2020

hcho3 commented Jul 6, 2020

jrhemstad commented Jul 6, 2020

hcho3 commented Jul 6, 2020 •

edited

Loading

hcho3 commented Jul 6, 2020 •

edited

Loading

trivialfis commented Jul 6, 2020

hcho3 commented Jul 6, 2020

trivialfis commented Jul 6, 2020

hcho3 commented Jul 6, 2020 •

edited

Loading

trivialfis commented Jul 6, 2020

jrhemstad commented Jul 6, 2020 •

edited

Loading

hcho3 commented Jul 7, 2020

jrhemstad commented Jul 7, 2020

[Feature proposal] Support RAPIDS Memory Manager (RMM) #5861

[Feature proposal] Support RAPIDS Memory Manager (RMM) #5861

Comments

daxiongshu commented Jul 6, 2020

hcho3 commented Jul 6, 2020

jrhemstad commented Jul 6, 2020

hcho3 commented Jul 6, 2020 • edited Loading

hcho3 commented Jul 6, 2020 • edited Loading

trivialfis commented Jul 6, 2020

hcho3 commented Jul 6, 2020

trivialfis commented Jul 6, 2020

hcho3 commented Jul 6, 2020 • edited Loading

trivialfis commented Jul 6, 2020

jrhemstad commented Jul 6, 2020 • edited Loading

hcho3 commented Jul 7, 2020

jrhemstad commented Jul 7, 2020

hcho3 commented Jul 6, 2020 •

edited

Loading

hcho3 commented Jul 6, 2020 •

edited

Loading

hcho3 commented Jul 6, 2020 •

edited

Loading

jrhemstad commented Jul 6, 2020 •

edited

Loading