-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support optimal partitioning for GPU hist. #7652
Support optimal partitioning for GPU hist. #7652
Conversation
I linked some of the references in the document. For a quick review, the algorithm is based on a proof by Fisher, which states that, when trying to partition a set of discrete values into groups based on the distances between a measure of these values, one only needs to look at sorted partitions instead of enumerating all possible permutations(I have a test comparing that). In the context of decision trees, the discrete values are categories, and the measure is the output leaf value. I will push some work into follow-up PRs as this one is already complicated as it's:
|
703451a
to
5dc6a9e
Compare
auto d_entries = out_entries; | ||
auto cats_out = this->DeviceCatStorage(left.nidx); | ||
// turn candidate into entry, along with hanlding sort based split. | ||
dh::LaunchN(right.feature_set.empty() ? 1 : 2, [=] __device__(size_t i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we can fuse this kernel into the segmented reduction using transform output iter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be non-trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to implement it using thrust instead of cub. Will follow up on this after initial feature complete implementation.
auto boundary = std::min(static_cast<size_t>((best_thresh + 1)), (f_sorted_idx.size() - 1)); | ||
boundary = std::max(boundary, static_cast<size_t>(1ul)); | ||
auto end = beg + boundary; | ||
thrust::for_each(thrust::seq, beg, end, [&](auto c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite inefficient when number of categories is huge. But I think it's best to push the optimization work after 1.6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. It is making our codebase way more complicated, but I think we knew that would happen.
eval_metric="auc", | ||
enable_categorical=True, | ||
max_cat_to_onehot=1, # We use optimal partitioning exclusively |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this interface a bit weird, but I see that it is following LightGBM.
EvaluateSplitInputs<GradientSumT> left, | ||
EvaluateSplitInputs<GradientSumT> right); | ||
template <typename GradientSumT> | ||
void EvaluateSingleSplit(common::Span<DeviceSplitCandidate> out_split, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going from a function to a class is not good, as we now we have lots of state. I'm not sure if anything can be done about it.
MaxCategory
in quantile.