-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid changing expression names during constant folding #1319
Changes from 10 commits
a656b44
fe8445d
ebf67d3
8e52b94
3191563
1cfbba0
46315a1
aa8cf15
7c22b5d
166b41a
d767aeb
8d715de
eec7cbe
2a7652d
22dac02
546d2a2
6e53b53
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -92,6 +92,10 @@ impl OptimizerRule for ConstantFolding { | |
.expressions() | ||
.into_iter() | ||
.map(|e| { | ||
// We need to keep original expression name, if any. | ||
// Constant folding should not change expression name. | ||
let name = &e.name(plan.schema()); | ||
|
||
// TODO iterate until no changes are made | ||
// during rewrite (evaluating constants can | ||
// enable new simplifications and | ||
|
@@ -101,7 +105,30 @@ impl OptimizerRule for ConstantFolding { | |
// fold constants and then simplify | ||
.rewrite(&mut const_evaluator)? | ||
.rewrite(&mut simplifier)?; | ||
Ok(new_e) | ||
|
||
let new_name = &new_e.name(plan.schema()); | ||
|
||
// Some plans will be candidates in projection pushdown rule to | ||
// trim expressions based on expression names. We need to keep | ||
// expression name for them. | ||
let is_plan_for_projection_pushdown = matches!( | ||
plan, | ||
LogicalPlan::Window { .. } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why only those? What about Currently this outputs:
I would assume it should keep the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For example, for Project, it will create many (looks redundant) aliases. Some looks okay but some looks really weird, e.g. some failed tests:
We have a lot tests that would be failed due to that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you tried following the model in https://github.com/apache/arrow-datafusion/pull/1315/files#diff-1d33be1a7e8231e53102eab8112e30aa89d8f5cb8c21cd25bcfbce3050cdb433R110 ? I think that calls Basically I think the code needs to do something like walk over the field names in the output There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (I agree with @Dandandan that this should apply to all nodes, not just a few special cased ones) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This sounds promising. No, I've not tried There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for trying @viirya -- I'll see if I can find some time this weekend to mess around with it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @alamb . I'll keep trying on this too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@viirya the example you gave here looks like correct behavior to me, are you concerned with lots of updates on the tests? or are there other unwanted side effect of this approach? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I'm simply unsure if such changes are okay here as it looks like most queries will be affected (not about its results but the cosmetic one). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it looks good for you, I will update all the tests. |
||
| LogicalPlan::Aggregate { .. } | ||
| LogicalPlan::Union { .. } | ||
); | ||
|
||
if let (Ok(expr_name), Ok(new_expr_name)) = (name, new_name) { | ||
if expr_name != new_expr_name | ||
&& is_plan_for_projection_pushdown | ||
{ | ||
Ok(new_e.alias(expr_name)) | ||
} else { | ||
Ok(new_e) | ||
} | ||
} else { | ||
Ok(new_e) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I worry we may be silently ignoring some real issues in the future. However, I tried checking
So I suppose this is as good as we are going to do for now |
||
} | ||
}) | ||
.collect::<Result<Vec<_>>>()?; | ||
|
||
|
@@ -733,7 +760,7 @@ mod tests { | |
.build()?; | ||
|
||
let expected = "\ | ||
Aggregate: groupBy=[[#test.a, #test.c]], aggr=[[MAX(#test.b), MIN(#test.b)]]\ | ||
Aggregate: groupBy=[[#test.a, #test.c]], aggr=[[MAX(#test.b) AS MAX(test.b = Boolean(true)), MIN(#test.b)]]\ | ||
\n Projection: #test.a, #test.c, #test.b\ | ||
\n TableScan: test projection=None"; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍