-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a shortcut for when the input clusters are all empty for the tdigest merge #16897
Add a shortcut for when the input clusters are all empty for the tdigest merge #16897
Conversation
I have just pushed another commit that has only comments and docs change. The actual code change should be found only in the first commit. |
Can you please retarget to and rebase on branch-24.12? We are currently in burndown, which is when new PRs should typically target the new release branch. |
@vyasr thanks. I have updated the base branch for this PR. I'm trying now to build this PR on |
0cb319e
to
e5bea9b
Compare
@vyasr I've confirmed that this PR on |
/ok to test |
/ok to test |
/merge |
1 similar comment
/merge |
The PR was merged while I was still reviewing it. Please note that two C++ approvals are required before merging any libcudf PR. |
@PointKernel thanks, that is good to know. Please let me know if you have any comment. I will address them in a follow-up PR. |
Removes unused variable that contains host copy of the group_offsets data. This host variable appears to have been made obsolete by a combination of #16897 and #16780 Found while working on #17149 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Nghia Truong (https://github.com/ttnghia) URL: #17151
Description
Fixes #16881. This is a new attempt to fix it.
Previously in #16675, I flipped the
has_nulls
flag to true as I thought that empty clusters should be explicitly stored in the offsets and handled properly. It turns out that it was not a good idea. After a long debugging process, I am convinced now that the existing logic is valid and should work well except for one case, where all input tdigests to the tdigest merge are empty. So, I have decided to add a shortcut to handle that particular edge case ingroup_merge_tdigest()
in this PR. This shortcut is executed only when all clusters are empty in all groups. This PR does not change any other logic. Other changes in this PR are:make_empty_tdigest_column
has been renamed tomake_tdigest_column_of_empty_clusters
and expanded to takenum_rows
.merge_tdigests()
function.Before making this PR, I have run the integration tests of the spark-rapids that were previously reported in NVIDIA/spark-rapids#11463 that my first attempt had caused them failing. They have all passed with this PR change.
Checklist