-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for non-column key for equijoin when eliminating cross join to inner join #4443
Conversation
@ygf11 thanks for your contribution, could you please add some IT in the
|
@liukun4515 Added. |
" Projection: t1.t1_id, t2.t2_id, t1.t1_name [t1_id:UInt32;N, t2_id:UInt32;N, t1_name:Utf8;N]", | ||
" Projection: t1.t1_id, t1.t1_name, t2.t2_id [t1_id:UInt32;N, t1_name:Utf8;N, t2_id:UInt32;N]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these two projections are not collapsed. This is unrelated to this PR. But we need to take a look at the PushDownProjection
rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jackwener
maybe @jackwener has fixed that in #4487
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't related with this PR. This projection is generated due to keep the same order.
It should need do more optimization in PushDownProjection
.
Recent I am doing a enhancement about PushDownProjection
, it should can resolve this.
// The display_name() of cast expression will ignore the cast info, and show the inner expression name. | ||
// If we do not add alais, it will throw same field name error in the schema when adding projection. | ||
// For example: | ||
// input scan : [a, b, c], | ||
// join keys: [cast(a as int)] | ||
// | ||
// then a and cast(a as int) will use the same field name - `a` in projection schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the Cast
expressions ignore the cast info ? Looks like this is not the expected behavior.
I think no matter the Cast is explicit or implicit, it should display the cast info.
@andygrove @liukun4515 @jackwener
How do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we need to consider to the Cast
.
- sql can explicitly include
cast
- after
infer cast
, it will generate cast.
But I need to time to think carefully about its detail.
The PR itself LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM generally.
But there may be some problem cooperate with other rules, but not related to this one.I will try to investigate and think about them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at several of the tests carefully and skimmed the rest of this PR -- I also appreciate @jackwener 's PR.
Thank you very much @ygf11 -- I'll plan to merge this PR tomorrow unless there are any additional comments
|
||
// reduce to inner join, t2.t2_id will insert cast. | ||
let sql = | ||
"select t1.t1_id, t2.t2_id, t1.t1_name from t1 cross join t2 where t1.t1_id + 11 = cast(t2.t2_id as BIGINT)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
let expected = vec![ | ||
"Filter: t2.c < UInt32(20) [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]", | ||
" Projection: t1.a, t1.b, t1.c, t2.a, t2.b, t2.c [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]", | ||
" Inner Join: t1.a + UInt32(100) = t2.a * UInt32(2) [a:UInt32, b:UInt32, c:UInt32, t1.a + UInt32(100):UInt32, a:UInt32, b:UInt32, c:UInt32, t2.a * UInt32(2):UInt32]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very cool
@@ -3115,32 +3101,6 @@ fn extract_join_keys( | |||
Ok(()) | |||
} | |||
|
|||
/// Wrap projection for a plan, if the join keys contains normal expression. | |||
fn wrap_projection_for_join_if_necessary( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is very nice to not do this in the planner anymore
.iter() | ||
.map(|key| { | ||
// The display_name() of cast expression will ignore the cast info, and show the inner expression name. | ||
// If we do not add alais, it will throw same field name error in the schema when adding projection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add sql test in the follow up pr.
select *,t1.t1_id+10 from t1,t2 where t1.t1_id+10=t2.t2_id
Thanks @ygf11
Thanks @liukun4515, I will add this test case in the follow up pr. |
Thanks @ygf11 for the contribution and everyone for the reviews! |
Which issue does this PR close?
Closes #4442.
Part of #4389.
Rationale for this change
EliminateCrossJoin
will only support eliminating cross join when the join keys are columns. We should support non-column join keys.What changes are included in this PR?
EliminateCrossJoin
.Are these changes tested?
Yes, unit tests will test it.
Are there any user-facing changes?