-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reapply mixed_semi_join
refactoring and bug fixes
#16859
Reapply mixed_semi_join
refactoring and bug fixes
#16859
Conversation
left_table_keep_mask[outer_row_index] = | ||
hash_table_view.contains(outer_row_index, hash_probe, equality); | ||
// Find all the rows in the left table that are in the hash table. | ||
for (auto outer_row_index = cudf::detail::grid_1d::global_thread_id<block_size>() / cg_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug fix (part 1): This was a simple if
before so only querying nelems/cg_size
items instead of all nelems
PerformanceBenchmarkJOIN_NVBENCH -d 0 --benchmark mixed_left_semi_join --timeout 100 Hardware
Comparison TableArthimetic mean of all % improvements: +0.79%
|
mixed_semi_join
refactoring and bug fixesmixed_semi_join
refactoring and bug fixes
|
||
// skip rows that are null here. | ||
if ((compare_nulls == null_equality::EQUAL) or (not nullable(build))) { | ||
hash_table.insert(iter, iter + right_num_rows, hash_build, equality_build, stream.value()); | ||
row_set.insert_async(iter, iter + right_num_rows, stream.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some perf improvement through use of _async
APIs.
|
||
detail::grid_1d const config(outer_num_rows, DEFAULT_JOIN_BLOCK_SIZE); | ||
auto const shmem_size_per_block = parser.shmem_per_thread * config.num_threads_per_block; | ||
detail::grid_1d const config(outer_num_rows * hash_set_type::cg_size, DEFAULT_JOIN_BLOCK_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug fix (part 2): We need to launch outer_num_rows * hash_set_type::cg_size
threads now. Even without this the loop in part 1 will take care of everything.
Performance results from Spark (credits: @zpuller). spark2a
spark-h
|
mixed_semi_join
refactoring and bug fixesmixed_semi_join
refactoring and bug fixes
Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an enlightening read. Very clean code, too. Thanks, @mhaseeb123.
LGTM.
/merge |
This PR reapplies changes from #16230 and adds bug fixes and performance improvements for mixed_semi_join. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Yunsong Wang (https://github.com/PointKernel) - MithunR (https://github.com/mythrocks) - Nghia Truong (https://github.com/ttnghia) URL: #16859
Description
This PR reapplies changes from #16230 and adds bug fixes and performance improvements for mixed_semi_join.
Checklist