-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: HashJoinStream
state machine
#8538
Conversation
Benchmark results for tpch_mem are
|
cc @alamb @metesynnada PTAL, if you have some time and feel related / being in context of the issue |
I will review it soon, but it looks great. |
I also plan to review this soon. Thank you @korowa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @korowa -- I read this PR carefully. I have a few style suggestions, but I don't think any of them are requred.
As I undertand it, this is a step towards being able to incrementally generate output batches for joins.
I wanted to also say the comments are really nice and make this PR a joy to read 👏
🏆
cc @Dandandan and @liukun4515
I think it would be good to get @metesynnada 's review before merging this as well
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1475e8e
to
b164328
Compare
Yes, the plan is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job on implementing the state machine - I really appreciate the effort you put into it! I'm glad to see that we now have a design coupling with SHJ. From what I can see, it doesn't look like there have been any changes in how we handle join, so I think it all looks good. Thanks for your hard work!
@Dandandan If this looks OK, we can merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the refactoring!
Thank you @korowa |
Which issue does this PR close?
Part of #8130.
Rationale for this change
Structuring HashJoinStream processing logic based on @alamb design suggestion from the issue.
What changes are included in this PR?
HashJoinStream::poll_next_impl
splitted into more granular functions which are used as a handlers, modifying stream state as their resultleft_fut
&visited_left_side
moved to HashJoinStream.build_side BuildSide attribute -- the reason for storing build-side related data separate from stream state, is that it allows to avoid redundant cloning of build side contents across all state changes (as all handlers operate on&mut self: HashJoinStream
) and still keeps build-side available for reading/mutating through references.stream_join_utils.rs
toutils.rs
+StreamJoinStateResult
renamed toStatefulStreamResult
-- seems reasonable as it is used for stateful streams and not only for stream-like joins -- any naming or rolling back related suggestions are welcome and appreciated.Are these changes tested?
Covered by existing test cases.
Are there any user-facing changes?
No.