Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mmap for external memory. #9282

Merged
merged 53 commits into from
Jun 19, 2023
Merged

Use mmap for external memory. #9282

merged 53 commits into from
Jun 19, 2023

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Jun 9, 2023

On a system with 128GB memory, I am able to train a model with hist on a dataset of about 290GB.

*Windows support is removed.

  • Remove the temporary page in GPU hist, eliminating unnecessary batch iteration.
  • Improve documentation.

Performance is bounded by IO, and it's unlikely to change for the foreseeable future. I ran the test on a PICE-4 NVME drive. Observing the run from htop/iotop, the disk read is relatively consistent. For fetching gradient index, it's about 3G/s throughout the training.

There are still many limiting factors in scaling with storage since we only batch the predictor. But this PR will make the external memory a bit more practical.

I will do some more experiments in coming days. At the moment, using mmap can help (not in this branch yet) reusing the Linux cache. Allocating large chunks of memory is extremely expensive operation when the memory is already under pressure.

The next step after this PR is to make the DMatrix pages' size immutable after construction. This way, we can reuse the pointer from mmap in structures like SparsePage. The end goal is to make sure XGBoost can use the Linux cache efficiently.

@trivialfis trivialfis marked this pull request as draft June 9, 2023 04:39
@@ -809,12 +810,11 @@ class GPUHistMaker : public TreeUpdater {
collective::Broadcast(&column_sampling_seed, sizeof(column_sampling_seed), 0);

auto batch_param = BatchParam{param->max_bin, TrainParam::DftSparseThreshold()};
auto page = (*dmat->GetBatches<EllpackPage>(ctx_, batch_param).begin()).Impl();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initiates an iteration on the sparse DMatrix but doesn't finish it. As a result, we run sketching twice before the PR.

@trivialfis
Copy link
Member Author

There are a couple of places where we can eliminate batch fetching, but I will leave that as future optimization:

auto is_ellpack = f_dmat && f_dmat->PageExists<EllpackPage>() &&

this->InitializeSparsePage(ctx);

@trivialfis trivialfis marked this pull request as ready for review June 14, 2023 16:40
@trivialfis trivialfis changed the title [POC] Use mmap for external memory. Use mmap for external memory. Jun 14, 2023
@trivialfis
Copy link
Member Author

trivialfis commented Jun 14, 2023

@rongou Would you like to take a look when you are available? I have some experiments in mind for external memory; this PR is the first step. I will follow up with some of them after 2.0:

  • Eliminate all unnecessary calls to DMatrix::GetBatches, which initiates a fetch sequence.
  • Try to support GPUDirect. Not sure if this is necessary though. The Linux kernel usually does an excellent job of caching data on the host memory.
  • Eliminate some page allocations by swapping in mapped ptr.
  • Look into scaling it with distributed training.

Maybe there are other types of memory reduction algorithms, and I would also like to learn more about them.

src/common/io.cc Outdated Show resolved Hide resolved
Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice docs as usual.

@trivialfis trivialfis merged commit ee6809e into dmlc:master Jun 19, 2023
@trivialfis trivialfis deleted the ext-mmap branch June 19, 2023 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants