Use mmap for external memory. #9282

trivialfis · 2023-06-09T04:39:24Z

On a system with 128GB memory, I am able to train a model with hist on a dataset of about 290GB.

*~~Windows support is removed.~~

Remove the temporary page in GPU hist, eliminating unnecessary batch iteration.
Improve documentation.

Performance is bounded by IO, and it's unlikely to change for the foreseeable future. I ran the test on a PICE-4 NVME drive. Observing the run from htop/iotop, the disk read is relatively consistent. For fetching gradient index, it's about 3G/s throughout the training.

There are still many limiting factors in scaling with storage since we only batch the predictor. But this PR will make the external memory a bit more practical.

I will do some more experiments in coming days. At the moment, using mmap can help (not in this branch yet) reusing the Linux cache. Allocating large chunks of memory is extremely expensive operation when the memory is already under pressure.

The next step after this PR is to make the DMatrix pages' size immutable after construction. This way, we can reuse the pointer from mmap in structures like SparsePage. The end goal is to make sure XGBoost can use the Linux cache efficiently.

trivialfis · 2023-06-12T20:01:55Z

src/tree/updater_gpu_hist.cu

@@ -809,12 +810,11 @@ class GPUHistMaker : public TreeUpdater {
    collective::Broadcast(&column_sampling_seed, sizeof(column_sampling_seed), 0);

    auto batch_param = BatchParam{param->max_bin, TrainParam::DftSparseThreshold()};
-    auto page = (*dmat->GetBatches<EllpackPage>(ctx_, batch_param).begin()).Impl();


This initiates an iteration on the sparse DMatrix but doesn't finish it. As a result, we run sketching twice before the PR.

trivialfis · 2023-06-14T10:59:01Z

There are a couple of places where we can eliminate batch fetching, but I will leave that as future optimization:

xgboost/src/gbm/gbtree.cc

Line 638 in e70810b

auto is_ellpack = f_dmat && f_dmat->PageExists<EllpackPage>() &&

xgboost/src/data/sparse_page_dmatrix.cc

Line 170 in e70810b

this->InitializeSparsePage(ctx);

trivialfis · 2023-06-14T22:15:54Z

@rongou Would you like to take a look when you are available? I have some experiments in mind for external memory; this PR is the first step. I will follow up with some of them after 2.0:

Eliminate all unnecessary calls to DMatrix::GetBatches, which initiates a fetch sequence.
Try to support GPUDirect. Not sure if this is necessary though. The Linux kernel usually does an excellent job of caching data on the host memory.
Eliminate some page allocations by swapping in mapped ptr.
Look into scaling it with distributed training.

Maybe there are other types of memory reduction algorithms, and I would also like to learn more about them.

src/common/io.cc

RAMitchell

Very nice docs as usual.

trivialfis marked this pull request as draft June 9, 2023 04:39

trivialfis added 10 commits June 12, 2023 02:06

use mmap for external memory.

b1573d4

reduce size.

23a89b0

cleanup.

fa5d460

abstract into a dmlc stream.

9ebd4ef

cleanup.

3832fd5

Cleanup.

18c3544

macos.

a04dc39

debug.

1e0405e

Fix.

ba358af

cleanup.

a6202d0

trivialfis force-pushed the ext-mmap branch from d3d858a to a6202d0 Compare June 11, 2023 19:45

trivialfis added 12 commits June 12, 2023 21:52

Cleanup.

117fb97

Skip python test.

9ee1643

lint.

da00b6d

Add test.

05ce49b

rename.

ed635d3

cleanup.

9b5c686

Remove page in grad-based sampling.

1b0dab2

remove page in uniform sampling.

f383f76

remove in no sampling.

68b838d

GPU initialization.

4b5d38f

use ctx.

4521f04

comment.

39ed218

trivialfis commented Jun 12, 2023

View reviewed changes

trivialfis added 5 commits June 13, 2023 04:03

lint.

a736125

doc.

a61a079

windows mmap

4989269

Merge branch 'ext-mmap-win' into ext-mmap

9068cc8

compile

341c8fb

trivialfis added 6 commits June 14, 2023 03:06

cleanup.

d3987e8

GPU compilation.

788f2b6

lint.

e88f561

log time.

6a02601

improve the tests.

9dd5812

Timer.

94b8a0d

trivialfis added 2 commits June 15, 2023 00:20

fix win leak

a4e11d3

mingw

8cdbb87

trivialfis marked this pull request as ready for review June 14, 2023 16:40

trivialfis added 2 commits June 15, 2023 00:54

reduce page number.

22ae3f6

Merge remote-tracking branch 'jiamingy/ext-mmap' into ext-mmap

3d76acc

trivialfis changed the title ~~[POC] Use mmap for external memory.~~ Use mmap for external memory. Jun 14, 2023

polishing.

c8726c3

trivialfis force-pushed the ext-mmap branch from fd8380a to c8726c3 Compare June 14, 2023 17:52

trivialfis added 3 commits June 15, 2023 01:57

Improve test.

b5b57a0

fix.

076a788

Forbid pointer to bool cast.

8b993ff

trivialfis commented Jun 14, 2023

View reviewed changes

src/common/io.cc Outdated Show resolved Hide resolved

trivialfis added 5 commits June 15, 2023 08:59

cleanup.

6169fdc

read-only.

914a186

read-only.

f3e39ac

fix.

de4f71c

Don't blame.

6552242

trivialfis mentioned this pull request Jun 19, 2023

Use ptr from mmap for GHistIndexMatrix and ColumnMatrix. #9315

Merged

RAMitchell approved these changes Jun 19, 2023

View reviewed changes

trivialfis merged commit ee6809e into dmlc:master Jun 19, 2023

trivialfis deleted the ext-mmap branch June 19, 2023 10:52

ShellLM mentioned this pull request Aug 11, 2024

Xgboost 2.0.0 · dmlc/xgboost irthomasthomas/undecidability#878

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use mmap for external memory. #9282

Use mmap for external memory. #9282

trivialfis commented Jun 9, 2023 •

edited

Loading

trivialfis Jun 12, 2023

trivialfis commented Jun 14, 2023

trivialfis commented Jun 14, 2023 •

edited

Loading

RAMitchell left a comment

Use mmap for external memory. #9282

Use mmap for external memory. #9282

Conversation

trivialfis commented Jun 9, 2023 • edited Loading

trivialfis Jun 12, 2023

Choose a reason for hiding this comment

trivialfis commented Jun 14, 2023

trivialfis commented Jun 14, 2023 • edited Loading

RAMitchell left a comment

Choose a reason for hiding this comment

trivialfis commented Jun 9, 2023 •

edited

Loading

trivialfis commented Jun 14, 2023 •

edited

Loading