feat(batch): introduce batch AsOf join #19790

yuhao-su · 2024-12-12T22:09:24Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

plan the batch asof join as a hash join + a group top 1 + an optional projection. The group top 1 will be sorted by the right side asof column and group by the hash key. The optional projection is for pruning the additional asof column added for the GroupTopN.

Checklist

I have written necessary rustdoc comments.
I have added necessary unit tests and integration tests.
I have added test labels as necessary.
I have added fuzzing tests or opened an issue to track them.
My PR contains breaking changes.
My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

Usage is the same same as #18683

Ties in the asof column in the right table.

The query result is non-deterministic when there are ties in the inequality condition column in the right table. ASOF JOIN aims to find and join the nearest record from the right table and only matches 1 row for each record from the left table. Therefore when there are ties in the right table, the right table row matched in the returned query result could be different in multiple runs.

My PR needs documentation updates.

Release note

chenzl25 · 2024-12-13T02:54:53Z

src/frontend/planner_test/tests/testdata/output/asof_join.yaml

  stream_error: 'Invalid input syntax: AsOf join requires exactly 1 ineuquality condition'
 - sql: CREATE TABLE t1(v1 varchar, v2 int, v3 int); CREATE TABLE t2(v1 varchar, v2 int, v3 int); SELECT t1.v1 t1_v1, t1.v2 t1_v2, t2.v1 t2_v1, t2.v2 t2_v2 FROM t1 ASOF JOIN t2 ON t1.v1 = t2.v1 || 'a' and t1.v2 > t2.v2;
+  batch_plan: |-
+    BatchExchange { order: [], dist: Single }
+    └─BatchGroupTopN { order: [t2.v2 DESC], limit: 1, offset: 0, group_key: [t1.v1, t1.v2] }


This plan seems incorrect because it will make the rows less then expected

chenzl25 · 2024-12-13T02:56:40Z

e2e_test/batch/join/asof_join.slt

+query IIIIII
+SELECT t1.v1 t1_v1, t1.v2 t1_v2, t1.v3 t1_v3, t2.v1 t2_v1, t2.v2 t2_v2, t2.v3 t2_v3 FROM t1 ASOF JOIN t2 ON t1.v1 = t2.v1 and t1.v2 < t2.v2 order by t1.v1;
+----
+1 2 3 1 3 4


Add one more row to the t1 and run this sql again.

insert into t1 values (1, 2, 9);

The expected result should be (verified by duckdb)

┌───────┬───────┬───────┬───────┬───────┬───────┐ │ t1_v1 │ t1_v2 │ t1_v3 │ t2_v1 │ t2_v2 │ t2_v3 │ │ int32 │ int32 │ int32 │ int32 │ int32 │ int32 │ ├───────┼───────┼───────┼───────┼───────┼───────┤ │ 1 │ 2 │ 3 │ 1 │ 3 │ 4 │ │ 1 │ 2 │ 9 │ 1 │ 3 │ 4 │ └───────┴───────┴───────┴───────┴───────┴───────┘

but current implementation will give us

t1_v1 | t1_v2 | t1_v3 | t2_v1 | t2_v2 | t2_v3 -------+-------+-------+-------+-------+------- 1 | 2 | 3 | 1 | 3 | 4 (1 row)

yuhao-su added 3 commits December 11, 2024 02:22

init

f3609b6

conflict

b927f8a

fix

47cf372

yuhao-su requested review from st1page and chenzl25 December 12, 2024 22:09

github-actions bot added the type/feature label Dec 12, 2024

fix

2445194

chenzl25 reviewed Dec 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(batch): introduce batch AsOf join #19790

feat(batch): introduce batch AsOf join #19790

yuhao-su commented Dec 12, 2024

chenzl25 Dec 13, 2024

chenzl25 Dec 13, 2024

feat(batch): introduce batch AsOf join #19790

Are you sure you want to change the base?

feat(batch): introduce batch AsOf join #19790

Conversation

yuhao-su commented Dec 12, 2024

What's changed and what's your intention?

Checklist

Documentation

Ties in the asof column in the right table.

chenzl25 Dec 13, 2024

Choose a reason for hiding this comment

chenzl25 Dec 13, 2024

Choose a reason for hiding this comment