Reapply `mixed_semi_join` refactoring and bug fixes #16859

mhaseeb123 · 2024-09-20T02:49:57Z

Description

This PR reapplies changes from #16230 and adds bug fixes and performance improvements for mixed_semi_join.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

cpp/tests/join/mixed_join_tests.cu

mhaseeb123 · 2024-09-20T03:02:37Z

cpp/src/join/mixed_join_kernels_semi.cu

-    left_table_keep_mask[outer_row_index] =
-      hash_table_view.contains(outer_row_index, hash_probe, equality);
+  // Find all the rows in the left table that are in the hash table.
+  for (auto outer_row_index = cudf::detail::grid_1d::global_thread_id<block_size>() / cg_size;


The bug fix (part 1): This was a simple if before so only querying nelems/cg_size items instead of all nelems

mhaseeb123 · 2024-09-20T05:00:36Z

Performance

Benchmark

JOIN_NVBENCH -d 0 --benchmark mixed_left_semi_join --timeout 100

Hardware

GPU: NVIDIA RTX 5880 Ada Generation
SM Version: 890 (PTX Version: 860)
Number of SMs: 110
SM Default Clock Rate: 18446744071874 MHz
Global Memory: 23879 MiB Free / 48632 MiB Total
Global Memory Bus Peak: 960 GB/sec (384-bit DDR @10001MHz)
Max Shared Memory: 100 KiB/SM, 48 KiB/Block
L2 Cache Size: 98304 KiB
Maximum Active Blocks: 24/SM
Maximum Active Threads: 1536/SM, 1024/Block
Available Registers: 65536/SM, 65536/Block
ECC Enabled: No

Comparison Table

Arthimetic mean of all % improvements: +0.79%
Geometic mean of positive % improvements: 6.7%
Geometic mean of negative % improvements: -14%

| Key | Nullable | left_size | right_size | GPU_Time_old | GPU_Time_new | pct_change |
|-----|----------|-----------|------------|--------------|--------------|------------|
| I32 |        0 |      1000 |       1000 |    94.260 us |    94.927 us |     -0.708 |
| I32 |        0 |    100000 |       1000 |   132.938 us |   146.014 us |     -9.836 |
| I32 |        0 |  10000000 |       1000 |     1.502 ms |     1.806 ms |     -20.24 |
| I32 |        0 |    100000 |     100000 |   157.627 us |   141.074 us |     10.501 |
| I32 |        0 |  10000000 |     100000 |     1.479 ms |     1.692 ms |    -14.402 |
| I32 |        0 |  10000000 |   10000000 |     5.867 ms |     4.150 ms |     29.265 |
| I32 |        1 |      1000 |       1000 |   108.833 us |   100.220 us |      7.914 |
| I32 |        1 |    100000 |       1000 |   143.874 us |   144.758 us |     -0.614 |
| I32 |        1 |  10000000 |       1000 |     1.283 ms |     1.416 ms |    -10.366 |
| I32 |        1 |    100000 |     100000 |   159.061 us |   145.933 us |      8.253 |
| I32 |        1 |  10000000 |     100000 |     1.156 ms |     1.418 ms |    -22.664 |
| I32 |        1 |  10000000 |   10000000 |     2.951 ms |     1.920 ms |     34.937 |
| I64 |        0 |      1000 |       1000 |    94.945 us |    87.359 us |       7.99 |
| I64 |        0 |    100000 |       1000 |   134.796 us |   138.466 us |     -2.723 |
| I64 |        0 |  10000000 |       1000 |     1.466 ms |     1.692 ms |    -15.416 |
| I64 |        0 |    100000 |     100000 |   159.470 us |   145.605 us |      8.694 |
| I64 |        0 |  10000000 |     100000 |     1.510 ms |     1.738 ms |    -15.099 |
| I64 |        0 |  10000000 |   10000000 |     6.862 ms |     5.223 ms |     23.885 |
| I64 |        1 |      1000 |       1000 |   106.791 us |    96.834 us |      9.324 |
| I64 |        1 |    100000 |       1000 |   143.220 us |   142.791 us |        0.3 |
| I64 |        1 |  10000000 |       1000 |     1.118 ms |     1.482 ms |    -32.558 |
| I64 |        1 |    100000 |     100000 |   159.526 us |   147.077 us |      7.804 |
| I64 |        1 |  10000000 |     100000 |     1.178 ms |     1.422 ms |    -20.713 |
| I64 |        1 |  10000000 |   10000000 |     2.980 ms |     1.992 ms |     33.154 |

mhaseeb123 · 2024-09-20T05:03:38Z

cpp/src/join/mixed_join_semi.cu


  // skip rows that are null here.
  if ((compare_nulls == null_equality::EQUAL) or (not nullable(build))) {
-    hash_table.insert(iter, iter + right_num_rows, hash_build, equality_build, stream.value());
+    row_set.insert_async(iter, iter + right_num_rows, stream.value());


Some perf improvement through use of _async APIs.

mhaseeb123 · 2024-09-20T18:42:53Z

cpp/src/join/mixed_join_semi.cu

-
-  detail::grid_1d const config(outer_num_rows, DEFAULT_JOIN_BLOCK_SIZE);
-  auto const shmem_size_per_block = parser.shmem_per_thread * config.num_threads_per_block;
+  detail::grid_1d const config(outer_num_rows * hash_set_type::cg_size, DEFAULT_JOIN_BLOCK_SIZE);


Bug fix (part 2): We need to launch outer_num_rows * hash_set_type::cg_size threads now. Even without this the loop in part 1 will take care of everything.

mhaseeb123 · 2024-09-23T18:57:36Z

Performance results from Spark (credits: @zpuller).

spark2a

Regression alerts
-----------------
--------------------------------------------------------------------
Name = query85
Means = 2845.4, 2253.0
Time diff = 592.4000000000001
Speedup = 1.2629383044829117
T-Test (test statistic, p value, df) = 2.43033039217987, 0.041181012901009915, 8.0
T-Test Confidence Interval = 30.304881537668848, 1154.4951184623314
ALERT: significant change has been detected (p-value < 0.05)
ALERT: improvement in performance has been observed
--------------------------------------------------------------------
Name = query90
Means = 1421.4, 863.4
Time diff = 558.0000000000001
Speedup = 1.6462821403752608
T-Test (test statistic, p value, df) = 2.6788276864789964, 0.02797678796082813, 8.0
T-Test Confidence Interval = 77.6591725781339, 1038.3408274218664
ALERT: significant change has been detected (p-value < 0.05)
ALERT: improvement in performance has been observed
--------------------------------------------------------------------
Name = query93
Means = 10904.4, 11764.0
Time diff = -859.6000000000004
Speedup = 0.9269296157769465
T-Test (test statistic, p value, df) = -2.9660190047403745, 0.017980126109303492, 8.0
T-Test Confidence Interval = -1527.9170780272916, -191.282921972709
ALERT: significant change has been detected (p-value < 0.05)
ALERT: regression in performance has been observed

spark-h

====================
Results for query16
====================
Regression alerts
-----------------

Speedup results
-----------------
query16: Previous (6218.6 ms) vs Current (6250.6 ms) Diff -32 E2E 0.99x
====================
Results for query94
====================
Regression alerts
-----------------

Speedup results
-----------------
query94: Previous (5255.2 ms) vs Current (5044.4 ms) Diff 210 E2E 1.04x

cpp/src/join/mixed_join_kernels_semi.cu

cpp/tests/join/mixed_join_tests.cu

Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>

cpp/src/join/mixed_join_common_utils.cuh

cpp/src/join/mixed_join_kernels_semi.cu

cpp/tests/join/mixed_join_tests.cu

PointKernel

Looks great 🚀

cpp/tests/join/mixed_join_tests.cu

mythrocks

This was an enlightening read. Very clean code, too. Thanks, @mhaseeb123.

LGTM.

mhaseeb123 · 2024-09-26T17:59:54Z

/merge

This PR reapplies changes from #16230 and adds bug fixes and performance improvements for mixed_semi_join. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Yunsong Wang (https://github.com/PointKernel) - MithunR (https://github.com/mythrocks) - Nghia Truong (https://github.com/ttnghia) URL: #16859

mhaseeb123 added 2 commits September 20, 2024 02:47

Reapply Refactor mixed_semi_join using cuco::static_set

cca0091

Fix for mixed_semi_join

36e1cd7

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 20, 2024

mhaseeb123 self-assigned this Sep 20, 2024

mhaseeb123 commented Sep 20, 2024

View reviewed changes

cpp/tests/join/mixed_join_tests.cu Show resolved Hide resolved

mhaseeb123 commented Sep 20, 2024

View reviewed changes

mhaseeb123 added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 20, 2024

mhaseeb123 added 4 commits September 20, 2024 04:15

Perf tuning.

66b3f88

Minor improvement

32b1f29

Add missing consts

37c1e04

Perf improvements

6063f75

mhaseeb123 changed the title ~~Reapply mixed_semi_join refactoring and bug fixes~~ [WIP] Reapply mixed_semi_join refactoring and bug fixes Sep 20, 2024

mhaseeb123 commented Sep 20, 2024

View reviewed changes

MInor comments

18c3e20

mhaseeb123 commented Sep 20, 2024

View reviewed changes

Merge branch 'branch-24.10' into make-mixed-semi-join-great-again

ba4e1e4

mhaseeb123 changed the base branch from branch-24.10 to branch-24.12 September 20, 2024 23:38

mhaseeb123 changed the base branch from branch-24.12 to branch-24.10 September 20, 2024 23:39

mhaseeb123 changed the title ~~[WIP] Reapply mixed_semi_join refactoring and bug fixes~~ Reapply mixed_semi_join refactoring and bug fixes Sep 23, 2024

mhaseeb123 changed the base branch from branch-24.10 to branch-24.12 September 23, 2024 21:40

mhaseeb123 added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Sep 24, 2024

mhaseeb123 marked this pull request as ready for review September 24, 2024 18:39

mhaseeb123 requested a review from a team as a code owner September 24, 2024 18:39

mhaseeb123 requested review from mythrocks and davidwendt September 24, 2024 18:39

mhaseeb123 requested review from PointKernel, ttnghia and vuule September 24, 2024 18:41

PointKernel reviewed Sep 24, 2024

View reviewed changes

cpp/src/join/mixed_join_kernels_semi.cu Outdated Show resolved Hide resolved

cpp/tests/join/mixed_join_tests.cu Show resolved Hide resolved

Update cpp/src/join/mixed_join_kernels_semi.cu

a2ddf7f

Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>

mythrocks reviewed Sep 25, 2024

View reviewed changes

cpp/src/join/mixed_join_common_utils.cuh Outdated Show resolved Hide resolved

cpp/src/join/mixed_join_kernels_semi.cu Show resolved Hide resolved

mythrocks reviewed Sep 25, 2024

View reviewed changes

cpp/src/join/mixed_join_kernels_semi.cu Show resolved Hide resolved

mhaseeb123 added 2 commits September 25, 2024 18:10

Merge branch 'branch-24.12' into make-mixed-semi-join-great-again

03e599c

Apply suggestions from reviewer comments

4c4821a

mhaseeb123 commented Sep 26, 2024

View reviewed changes

cpp/tests/join/mixed_join_tests.cu Outdated Show resolved Hide resolved

mhaseeb123 added 2 commits September 25, 2024 19:47

Update cpp/tests/join/mixed_join_tests.cu

d618b55

Increase number of rows in tables for better testing.

67ce080

PointKernel approved these changes Sep 26, 2024

View reviewed changes

cpp/tests/join/mixed_join_tests.cu Show resolved Hide resolved

mythrocks approved these changes Sep 26, 2024

View reviewed changes

ttnghia approved these changes Sep 26, 2024

View reviewed changes

rapids-bot bot merged commit b1e1c9c into rapidsai:branch-24.12 Sep 26, 2024
100 checks passed

mhaseeb123 deleted the make-mixed-semi-join-great-again branch September 26, 2024 18:00

abellina mentioned this pull request Oct 15, 2024

[BUG] mixed left_semi and left_anti produce incorrect results in 24.10 #16852

Closed

mhaseeb123 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reapply `mixed_semi_join` refactoring and bug fixes #16859

Reapply `mixed_semi_join` refactoring and bug fixes #16859

mhaseeb123 commented Sep 20, 2024

mhaseeb123 Sep 20, 2024 •

edited

Loading

mhaseeb123 commented Sep 20, 2024 •

edited

Loading

mhaseeb123 Sep 20, 2024

mhaseeb123 Sep 20, 2024 •

edited

Loading

mhaseeb123 commented Sep 23, 2024

PointKernel left a comment

mythrocks left a comment

mhaseeb123 commented Sep 26, 2024

Reapply mixed_semi_join refactoring and bug fixes #16859

Reapply mixed_semi_join refactoring and bug fixes #16859

Conversation

mhaseeb123 commented Sep 20, 2024

Description

Checklist

mhaseeb123 Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

mhaseeb123 commented Sep 20, 2024 • edited Loading

Performance

Benchmark

Hardware

Comparison Table

mhaseeb123 Sep 20, 2024

Choose a reason for hiding this comment

mhaseeb123 Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

mhaseeb123 commented Sep 23, 2024

spark2a

spark-h

PointKernel left a comment

Choose a reason for hiding this comment

mythrocks left a comment

Choose a reason for hiding this comment

mhaseeb123 commented Sep 26, 2024

Reapply `mixed_semi_join` refactoring and bug fixes #16859

Reapply `mixed_semi_join` refactoring and bug fixes #16859

mhaseeb123 Sep 20, 2024 •

edited

Loading

mhaseeb123 commented Sep 20, 2024 •

edited

Loading

mhaseeb123 Sep 20, 2024 •

edited

Loading