Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix perf gap in thread safe prediction #6696

Merged
merged 2 commits into from
Feb 16, 2021

Conversation

ShvetsKS
Copy link
Contributor

@ShvetsKS ShvetsKS commented Feb 9, 2021

As in #6648 local RegTree::FVec storage was introduced to preserve thread safety each training iteration requires buffer initialization in case of subsampling.

stages (santander dataset) before #6648 thread safe #6648 and master current PR
full training 136s 165s 140s
PredictRaw 55s 78s 59s

this PR introduces threading for local buffers initialization.
But as still some gaps are exist is it possible to have not thread safe PredictDMatrix call during training?

@ShvetsKS ShvetsKS changed the title Fix perf issue in thread safe prediction Fix perf gap in thread safe prediction Feb 9, 2021
@trivialfis
Copy link
Member

I thought we have prediction cache enabled?

@ShvetsKS
Copy link
Contributor Author

ShvetsKS commented Feb 9, 2021

I thought we have prediction cache enabled?

For subsampling case we still don't have prediction caching (seems it's still in progress: #6683, major complexity is related to multiclass classification case as for each group own indices subset is generated.)

@trivialfis
Copy link
Member

trivialfis commented Feb 10, 2021

@ShvetsKS Would you like to take a look into the GPU implementation of subsampling? Or my WIP rewrite for CPU hist? The subsampling doesn't have to conflict with cache.

@trivialfis
Copy link
Member

Last time your found it to be slower than master branch, which is expected as I don't have time to incooperate many recent optimization into it. But I believe my rewrite can at least offer some insight on how to implement some of the features in a different way.

@ShvetsKS
Copy link
Contributor Author

Last time your found it to be slower than master branch, which is expected as I don't have time to incooperate many recent optimization into it. But I believe my rewrite can at least offer some insight on how to implement some of the features in a different way.

Current changes are applicable even for inference stage as it's better to initialize RegTree::FVec thread locally anyway.

@ShvetsKS ShvetsKS force-pushed the fix_perf_in_thread_safe_predict branch 2 times, most recently from 4e8ed8c to 198747f Compare February 10, 2021 09:40
@codecov-io
Copy link

codecov-io commented Feb 10, 2021

Codecov Report

Merging #6696 (198747f) into master (9b267a4) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #6696   +/-   ##
=======================================
  Coverage   81.56%   81.56%           
=======================================
  Files          13       13           
  Lines        3759     3759           
=======================================
  Hits         3066     3066           
  Misses        693      693           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b267a4...198747f. Read the comment docs.

@ShvetsKS ShvetsKS force-pushed the fix_perf_in_thread_safe_predict branch from 198747f to 1d4e4d1 Compare February 10, 2021 12:02
@ShvetsKS
Copy link
Contributor Author

problem with continuous-integration/travis-ci/pr is related with server node
await ServerNode.close(self)
seems restarting should help.

@trivialfis
Copy link
Member

Sorry for the long wait. Still on vacation, but will try to look into it soon as possible.

@trivialfis trivialfis merged commit 9f15b9e into dmlc:master Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants