New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

mgpu predictor using explicit offsets #4438

Merged

RAMitchell merged 15 commits into dmlc:master from rongou:pred-explicit-sharding

May 10, 2019

Contributor

rongou commented May 3, 2019

Alternative approach to #4437. Uses explicit offsets of slide over the out prediction vector. I feel this is slightly cleaner, but I'm ok with either approach.

@canonizer @RAMitchell @sriramch

sriramch reviewed

View reviewed changes

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved

rongou force-pushed the pred-explicit-sharding branch from 8472e27 to 024ca7a Compare

May 6, 2019 17:01

trivialfis reviewed

View reviewed changes

src/predictor/gpu_predictor.cu Show resolved Hide resolved

rongou force-pushed the pred-explicit-sharding branch from 024ca7a to 5c98525 Compare

May 6, 2019 21:41

Member

RAMitchell commented May 6, 2019

I like this PR based on its simplicity. @rongou @sriramch can one of you please do an experiment comparing the performance of both of your PRs checking for large differences in performance. It sometimes hard to see exactly how many memory copies are occurring when using HostDeviceVector.

sriramch reviewed

View reviewed changes

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved

sriramch reviewed

View reviewed changes

src/predictor/gpu_predictor.cu Show resolved Hide resolved

sriramch reviewed

View reviewed changes

src/predictor/gpu_predictor.cu Show resolved Hide resolved

sriramch reviewed

View reviewed changes

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved

Contributor

sriramch commented May 7, 2019

@RAMitchell thanks for your review. this 'pr' can optimize away those copies for the batch prediction which the other 'pr' does. but, i think this 'pr' still has to work for smaller batch sizes. i have provided some comments to this effect earlier.

canonizer suggested changes

View reviewed changes

src/predictor/gpu_predictor.cu Show resolved Hide resolved

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved

src/predictor/gpu_predictor.cu Show resolved Hide resolved

src/predictor/gpu_predictor.cu Show resolved Hide resolved

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved

Contributor Author

rongou commented May 7, 2019

@RAMitchell @sriramch I ran some experiments on these two PRs using 1 billion rows, written to a tmpfs directory on a GCP VM with 4x T4 GPUs.

Timing of the PredictBatch() method:

This PR:
18.972 s
17.855 s
18.054 s

#4437:
20.842 s
20.123 s
19.931 s

Looks like this PR is slightly faster.

Contributor Author

rongou commented May 8, 2019

@canonizer @RAMitchell @sriramch @trivialfis all the comments have been addressed. Please take another look.

sriramch reviewed

View reviewed changes

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved

sriramch reviewed

View reviewed changes

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved

sriramch reviewed

View reviewed changes

tests/cpp/predictor/test_gpu_predictor.cu Show resolved Hide resolved

sriramch reviewed

View reviewed changes

Contributor

sriramch left a comment

rest lgtm...

RAMitchell reviewed

View reviewed changes

Member

RAMitchell left a comment

LGTM aside from one comment

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved

sriramch mentioned this pull request

prediction with external memory on multiple gpus... #4437

Closed

Contributor Author

rongou commented May 9, 2019

@hcho3 looks like the build machine ran out of disk space?

[2019-05-09T02:17:46.935Z] tar: .m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.pom: Cannot write: No space left on device

rongou force-pushed the pred-explicit-sharding branch from 7b407f4 to 5994a21 Compare

May 9, 2019 16:31

canonizer suggested changes

View reviewed changes

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved

src/predictor/gpu_predictor.cu Show resolved Hide resolved

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved

Collaborator

hcho3 commented May 9, 2019

@rongou I will go ahead and expand disk space for the slave workers.

canonizer reviewed

View reviewed changes

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved

rongou added 6 commits

May 9, 2019 16:07


          mgpu prediction using explicit sharding

5f557ef


          limit # gpus when batch is small

9e1fe11


          remove trailing space

5b21d99


          reshard out_preds after all batches are done

12ea15c


          reduce test size

62c61c1


          test multi-class predictions

e6c4939

rongou added 9 commits

May 9, 2019 16:07


          do not reconfigure devices per batch

93dceb1


          ignore empty shards

6b8330b


          handle sparse page offsets correctly for empty shards

dbd35bf


          add todo to only copy model once

09e1bcf


          address review comments

aa1bd64


          remove unnecessary check

cbcbcca


          only reshard out_preds when necessary

9ee67c5


          use batch.data.Size() for empty shards

90009b8


          add note for windowing distribution

0b8bff5

rongou force-pushed the pred-explicit-sharding branch from de96c65 to 0b8bff5 Compare

May 9, 2019 23:16

hcho3 mentioned this pull request

XGBoost 0.90 Roadmap #4389

Closed

18 tasks

Collaborator

hcho3 commented May 10, 2019

@rongou @RAMitchell I think this PR should be part of 0.90, since it's a follow up to #4284. Can we merge this now?

RAMitchell merged commit be0f346 into dmlc:master

rongou deleted the pred-explicit-sharding branch

May 13, 2019 22:28

This was referenced May 14, 2019

only copy the model once when predicting multiple batches #4457

Merged

[RFC] Version 0.90 release candidate #4475

Merged

lock bot locked as resolved and limited conversation to collaborators

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet