-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations for CPU #4529
Optimizations for CPU #4529
Conversation
Thanks for huge effort! A few questions in general
|
@chenqin Hi!
Here I mean - vector instruction sets mostly. Now XGBoost uses sse2 only, but for gradient/hessians computations (as an example) AVX512 can be used as well:
One problem here - that compiler in some cases can't auto-vectorize your code, instrinsic-functions help a lot here. Also, if we want to support different instructions sets (for example AVX512 and sse2) in one library-file - we need to have separate paths of code for them. It can be done, but requires some changes in build system. I think, if topic of gradient boosting optimizations are interested - we can write some article for web site (as it was done for GPU). From my side I can provide materials about this topic. @chenqin, @hcho3 What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hard work, this is incredible. ;-) I'm going to work on this along with DMatrix soon. So will approve as long as all tests passed (as in current case). @hcho3
Hi @hcho3, |
Reviewing now |
@hcho3 I'm only approving the PR because the tests passed, and I think optimizing histogram build will be beneficial to future improvement. But if we merge this someone needs to bring a huge refactor to quantile_hist and clarify how all these are put together. Personally I prefer optimizing this by thinking about the whole structure. And I think there's lots of opportunities in optimization if we look at how the whole procedure is run, I mean how many of the steps are necessary if we do some preprocessing to the data like calculating the number of non-zero columns, implement a sparse histogram you mentioned, if it's possible to do grouping block before constructing quantile cuts etc. Or if it's possible to implement prediction cache for multi-class objective so that CSR format may not be needed at all. BTW, @RAMitchell , when I mentioned "column matrix", I meant this class: https://github.com/dmlc/xgboost/blob/master/src/common/column_matrix.h Going instruction level optimization is cool, but to me quantile hist has a lot more room to be improved on other places and vectorization might not be the critical point. I don't have the time to do it myself right now, it just my opinions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SmirnovEgorRu Thanks a lot for preparing the pull request. The code is better organized and more comprehensible than #4433, and benchmark results are nice. I also like how now all operations are batched over multiple nodes. Overall, I'm inclined to approve this pull request.
@trivialfis I agree that more clarification of what each component does is in order.
Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
@hcho3, thanks for review. I have applied all comments, CI is green. |
Looks like the R project website (https://cloud.r-project.org) is currently down, causing Windows R tests to fail. For now, we will ignore the failed tests. |
@hcho3, thank you! |
@SmirnovEgorRu It’s merged now. Thanks! |
Thanks for including these changes! Is there an ETA as to when a release can be expected? |
Are results (model, predictions) reproducible for same machine, same random state, same input? |
@arnocandel I tested this for multiple data sets - it was reproducible even for multicore systems, |
Hi @hcho3, @trivialfis, |
See #4680 |
It is this PR #4433. But I have merged all commits in once and applied comment in the review.
My next steps here: