-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support cpu quantile sketch with column-wise data split #8742
Conversation
@trivialfis @hcho3 The CI is mostly passing now, the two failures are probably not related to this PR. Please take a look. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good to me. one small question inlined in comments.
src/common/quantile.cc
Outdated
@@ -283,7 +285,9 @@ void SketchContainerImpl<WQSketch>::AllReduce( | |||
collective::Allreduce<collective::Operation::kMax>(&n_columns, 1); | |||
CHECK_EQ(n_columns, sketches_.size()) << "Number of columns differs across workers"; | |||
|
|||
AllreduceCategories(feature_types_, n_threads_, &categories_); | |||
if (!col_split_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should LOG(FATAL)
in all reduce categories if this is a column split matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllreduceCategories
is a freestanding function that doesn't have access to this->col_split_
. Maybe we should make it a member function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it a private member function. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Please merge from the latest master branch to resolve CI issue.
Not sure why the dask test is failing. I can't reproduce it locally. But it does look a little suspicious since it's |
It's unrelated. I have a fix. |
When doing quantile sketch with column-wise split data, we only need to sketch columns available locally, there is no need to synchronize the cuts across all the workers. Later on when we evaluate the splits, we also only need to evaluate on locally available columns, and then synchronize across the workers to find the global best split.
This PR also changes the
DMatrix::SliceCol
method slightly since we always split the columns into n (roughly) equal parts, so that we don't have to repeat the slice index calculation everywhere.