-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move the extract_join_keys to optimizer #4711
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ygf11 -- this PR looks very nice 👌
datafusion/sql/src/planner.rs
Outdated
@@ -806,7 +805,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |||
) -> Result<LogicalPlan> { | |||
match constraint { | |||
JoinConstraint::On(sql_expr) => { | |||
let mut keys: Vec<(Expr, Expr)> = vec![]; | |||
// let mut keys: Vec<(Expr, Expr)> = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forget it. Removed now.
|
||
Ok(plan) | ||
} | ||
_ => Ok(Some(optimize_children(self, plan, config)?)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackwener has been removing direct recursion in optimizer rules -- for example see #4687
Perhaps we could do the same with this rule too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can do it.
Utilizing the traverse of optimizer now.
)? | ||
.build()?; | ||
let expected = vec![ | ||
"Left Join: t1.a = t2.a [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Some(binary_expr( | ||
(col("t1.a") + lit(10i64)).gt_eq(col("t2.a") * lit(2u32)), | ||
Operator::And, | ||
col("t1.b").lt(lit(100i32)), | ||
)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thunk you can use .and
here if you want (untested)
Some(binary_expr( | |
(col("t1.a") + lit(10i64)).gt_eq(col("t2.a") * lit(2u32)), | |
Operator::And, | |
col("t1.b").lt(lit(100i32)), | |
)), | |
Some( | |
col("t1.a") + lit(10i64)).gt_eq(col("t2.a") * lit(2u32)) | |
.and(col("t1.b").lt(lit(100i32))) | |
)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is short and clear, fixed.
} | ||
|
||
#[cfg(test)] | ||
mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nice tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice job to me👍 except a possible problem.
// expression that didn't match equi-join pattern | ||
let mut filter = vec![]; | ||
|
||
// extract join keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to see them are removed from planner
I'm not sure if some special case will meet problem. like:
|
I think carefully it's ok, because it's OK when it exist in planner, it don't cause problem to just move it into optimizer. and @ygf11 added a new test for it. So LGTM. |
@jackwener Thanks for review. I think it is ok. Following up are the reason.
No, this rule will ignore Because this rule will treat eq expression as equijoin only when each side of |
@@ -237,6 +238,7 @@ impl Optimizer { | |||
let rules: Vec<Arc<dyn OptimizerRule + Sync + Send>> = vec![ | |||
Arc::new(InlineTableScan::new()), | |||
Arc::new(TypeCoercion::new()), | |||
Arc::new(ExtractEquijoinPredicate::new()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some subquery will be converted to the join in the following up SubqueryFilterToJoin
rule., and I am not sure if this rule is right in this position
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ygf11 @jackwener
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this rule should be placed behind SubqueryFilterToJoin
rule. @jackwener also suggested about it in #4725 (comment).
I think it is ok now, because SubqueryFilterToJoin
and other similar rule directly specify the equijoin predicate, so they do not depends ExtractEquijoinPredicate
currently.
I plan to do this movement after #4725 merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this rule should be placed behind
SubqueryFilterToJoin
rule. @jackwener also suggested about it in #4725 (comment).I think it is ok now, because
SubqueryFilterToJoin
and other similar rule directly specify the equijoin predicate, so they do not dependsExtractEquijoinPredicate
currently.
Yes, other rule will convert query to join with specified equal-join predicate and non-equal-join predicate.
I plan to do this movement after #4725 merged.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If there is no other comments until tomorrow, i will merge it.
Thanks @ygf11
Benchmark runs are scheduled for baseline = 9331ee3 and contender = 734c211. 734c211 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4710.
Rationale for this change
See #4710
What changes are included in this PR?
extract join keys
in sql planner.ExtractEquijoinPredicate
to optimizer.Are these changes tested?
Yes.
Are there any user-facing changes?