Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgpu predictor using explicit offsets #4438

Merged
merged 15 commits into from
May 10, 2019

Conversation

rongou
Copy link
Contributor

@rongou rongou commented May 3, 2019

Alternative approach to #4437. Uses explicit offsets of slide over the out prediction vector. I feel this is slightly cleaner, but I'm ok with either approach.

@canonizer @RAMitchell @sriramch

@RAMitchell
Copy link
Member

I like this PR based on its simplicity. @rongou @sriramch can one of you please do an experiment comparing the performance of both of your PRs checking for large differences in performance. It sometimes hard to see exactly how many memory copies are occurring when using HostDeviceVector.

@sriramch
Copy link
Contributor

sriramch commented May 7, 2019

@RAMitchell thanks for your review. this 'pr' can optimize away those copies for the batch prediction which the other 'pr' does. but, i think this 'pr' still has to work for smaller batch sizes. i have provided some comments to this effect earlier.

src/predictor/gpu_predictor.cu Show resolved Hide resolved
src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved
src/predictor/gpu_predictor.cu Show resolved Hide resolved
src/predictor/gpu_predictor.cu Show resolved Hide resolved
src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved
@rongou
Copy link
Contributor Author

rongou commented May 7, 2019

@RAMitchell @sriramch I ran some experiments on these two PRs using 1 billion rows, written to a tmpfs directory on a GCP VM with 4x T4 GPUs.

Timing of the PredictBatch() method:

This PR:
18.972 s
17.855 s
18.054 s

#4437:
20.842 s
20.123 s
19.931 s

Looks like this PR is slightly faster.

@rongou
Copy link
Contributor Author

rongou commented May 8, 2019

@canonizer @RAMitchell @sriramch @trivialfis all the comments have been addressed. Please take another look.

Copy link
Contributor

@sriramch sriramch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm...

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from one comment

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved
@rongou
Copy link
Contributor Author

rongou commented May 9, 2019

@hcho3 looks like the build machine ran out of disk space?

[2019-05-09T02:17:46.935Z] tar: .m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.pom: Cannot write: No space left on device

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved
src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved
src/predictor/gpu_predictor.cu Show resolved Hide resolved
src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved
@hcho3
Copy link
Collaborator

hcho3 commented May 9, 2019

@rongou I will go ahead and expand disk space for the slave workers.

@hcho3 hcho3 mentioned this pull request May 10, 2019
18 tasks
@hcho3
Copy link
Collaborator

hcho3 commented May 10, 2019

@rongou @RAMitchell I think this PR should be part of 0.90, since it's a follow up to #4284. Can we merge this now?

@RAMitchell RAMitchell merged commit be0f346 into dmlc:master May 10, 2019
@rongou rongou deleted the pred-explicit-sharding branch May 13, 2019 22:28
@lock lock bot locked as resolved and limited conversation to collaborators Aug 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants