Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(batch): introduce batch AsOf join #19790

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

yuhao-su
Copy link
Contributor

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

plan the batch asof join as a hash join + a group top 1 + an optional projection. The group top 1 will be sorted by the right side asof column and group by the hash key. The optional projection is for pruning the additional asof column added for the GroupTopN.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

Usage is the same same as #18683

Ties in the asof column in the right table.

The query result is non-deterministic when there are ties in the inequality condition column in the right table. ASOF JOIN aims to find and join the nearest record from the right table and only matches 1 row for each record from the left table. Therefore when there are ties in the right table, the right table row matched in the returned query result could be different in multiple runs.

  • My PR needs documentation updates.
Release note

stream_error: 'Invalid input syntax: AsOf join requires exactly 1 ineuquality condition'
- sql: CREATE TABLE t1(v1 varchar, v2 int, v3 int); CREATE TABLE t2(v1 varchar, v2 int, v3 int); SELECT t1.v1 t1_v1, t1.v2 t1_v2, t2.v1 t2_v1, t2.v2 t2_v2 FROM t1 ASOF JOIN t2 ON t1.v1 = t2.v1 || 'a' and t1.v2 > t2.v2;
batch_plan: |-
BatchExchange { order: [], dist: Single }
└─BatchGroupTopN { order: [t2.v2 DESC], limit: 1, offset: 0, group_key: [t1.v1, t1.v2] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan seems incorrect because it will make the rows less then expected

Comment on lines +17 to +20
query IIIIII
SELECT t1.v1 t1_v1, t1.v2 t1_v2, t1.v3 t1_v3, t2.v1 t2_v1, t2.v2 t2_v2, t2.v3 t2_v3 FROM t1 ASOF JOIN t2 ON t1.v1 = t2.v1 and t1.v2 < t2.v2 order by t1.v1;
----
1 2 3 1 3 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one more row to the t1 and run this sql again.

 insert into t1 values (1, 2, 9);

The expected result should be (verified by duckdb)

┌───────┬───────┬───────┬───────┬───────┬───────┐
│ t1_v1 │ t1_v2 │ t1_v3 │ t2_v1 │ t2_v2 │ t2_v3 │
│ int32 │ int32 │ int32 │ int32 │ int32 │ int32 │
├───────┼───────┼───────┼───────┼───────┼───────┤
│     1 │     2 │     3 │     1 │     3 │     4 │
│     1 │     2 │     9 │     1 │     3 │     4 │
└───────┴───────┴───────┴───────┴───────┴───────┘

but current implementation will give us

 t1_v1 | t1_v2 | t1_v3 | t2_v1 | t2_v2 | t2_v3
-------+-------+-------+-------+-------+-------
     1 |     2 |     3 |     1 |     3 |     4
(1 row)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants