Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(planner): support independent right join #7634

Merged
merged 11 commits into from
Sep 19, 2022

Conversation

xudong963
Copy link
Member

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Support right join (won't convert into left join)

Fixes #7599

@vercel
Copy link

vercel bot commented Sep 15, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Sep 19, 2022 at 5:47AM (UTC)

@xudong963 xudong963 marked this pull request as draft September 15, 2022 07:48
@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Sep 15, 2022
@Xuanwo
Copy link
Member

Xuanwo commented Sep 17, 2022

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Sep 17, 2022

update

✅ Branch has been successfully updated

@xudong963 xudong963 marked this pull request as ready for review September 17, 2022 17:22
@BohuTANG
Copy link
Member

It would be better to 'make lint' let clippy check locally before push to the github.

'src/query/service/src/pipelines/processors/transforms/hash_join/join_hash_table.rs:696:41
1568
|
1569
696 | for (chunk_index, chunk) in self.row_space.chunks.read().unwrap().iter().enumerate() {
1570
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1571
|'

@xudong963
Copy link
Member Author

select * from (SELECT number AS a FROM numbers(10000000)) x left join (SELECT number AS a FROM numbers(100000000))  y on x.a = y.a order by x.a;

10000000 rows in set (24.56 sec)
Read 110000000 rows, 839.23 MiB in 24.558 sec., 4.48 million rows/sec., 34.17 MiB/sec.

After join reorder, the above query can be converted into the following right join with ~2x performance improvement.

select * from (SELECT number AS a FROM numbers(100000000)) x right join (SELECT number AS a FROM numbers(10000000))  y on x.a = y.a order by x.a;

10000000 rows in set (10.58 sec)
Read 110000000 rows, 839.23 MiB in 10.578 sec., 10.4 million rows/sec., 79.34 MiB/sec.

@BohuTANG
Copy link
Member

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Sep 18, 2022

update

✅ Branch has been successfully updated

@xudong963
Copy link
Member Author

It would be better to 'make lint' let clippy check locally before push to the github.

'src/query/service/src/pipelines/processors/transforms/hash_join/join_hash_table.rs:696:41 1568 | 1569 696 | for (chunk_index, chunk) in self.row_space.chunks.read().unwrap().iter().enumerate() { 1570 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1571 |'

rust-lang/rust-clippy#8987 A confusing lint

@BohuTANG BohuTANG merged commit 510e727 into databendlabs:main Sep 19, 2022
@xudong963 xudong963 deleted the right_joi branch September 19, 2022 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement independent right outer join
5 participants