-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix: common_sub_expression_eliminate optimizer rule failed #16066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: common_sub_expression_eliminate optimizer rule failed #16066
Conversation
Common_sub_expression_eliminate rule failed with error: `SchemaError(FieldNotFound {field: <name>}, valid_fields: []})` due to the schema being changed by the second application of `find_common_exprs` As I understood the source of the problem was in sequential call of `find_common_exprs`. First call returned original names as `aggr_expr` and changed names as `new_aggr_expr`. Second call takes into account only `new_aggr_expr` and if names was already changed by first call will return changed names as `aggr_expr`(original ones) and put them into Projection logic. I used NamePreserver mechanism to restore original schema names and generate Projection with original name at the end of aggregate optimization.
.map(|expr| Some(name_preserver.save(expr))) | ||
.collect::<Vec<_>>() | ||
} else { | ||
new_aggr_expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same as a vec of all None?
vec![None; new_agg_expr.len()]`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is, thanks for the comment. But this will not work, because vec! requires value to implement Clone trait but the Option<SavedName> doesn't. It seems to me easier to do this by map instead of adding trait to SavedName.
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Hi @alamb, thanks again for your comment, please have a look on this. Please let me know if this problem was already solved or move my commit out of stale label. |
(2.0, 20, -5), | ||
(3.0, 20, 4); | ||
|
||
# https://github.com/apache/datafusion/issues/15291 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason this query is failing for me
> WITH s AS (
SELECT
COUNT(a) FILTER (WHERE (b * b) - 3600 <= b),
COUNT(a) FILTER (WHERE (b * b) - 3000 <= b AND (c >= 0)),
COUNT(a) FILTER (WHERE (b * b) - 3000 <= b AND (c >= 0) AND (c >= 0))
FROM t
) SELECT * FROM s; 🤔 Invalid statement: SQL error: ParserError("Expected: ), found: ( at Line: 3, Column: 25")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nm, I can get it to reproduce like this
> set datafusion.sql_parser.dialect = 'postgres';
0 row(s) fetched.
Elapsed 0.000 seconds.
> CREATE TABLE t (
a DOUBLE,
b BIGINT,
c INT
) AS VALUES
(1.0, 10, -5),
(2.0, 20, -5),
(3.0, 20, 4);
0 row(s) fetched.
Elapsed 0.002 seconds.
> WITH s AS (
SELECT
COUNT(a) FILTER (WHERE (b * b) - 3600 <= b),
COUNT(a) FILTER (WHERE (b * b) - 3000 <= b AND (c >= 0)),
COUNT(a) FILTER (WHERE (b * b) - 3000 <= b AND (c >= 0) AND (c >= 0))
FROM t
) SELECT * FROM s
;
Optimizer rule 'common_sub_expression_eliminate' failed
caused by
Schema error: No field named "count(t.a) FILTER (WHERE t.b * t.b - Int64(3600) <= t.b)". Did you mean 'count(t.a) FILTER (WHERE __common_expr_1 AS t.b * t.b - Int64(3600) <= t.b)'?.
I am not sure, I merged up to main and started the CI again. |
Hi, @alamb, just a kind reminder, if all checks have passed and my test reproduces the issue maybe we can merge the changes or there are still any problems with my fix? |
Sorry for the delay -- I lost track of this PR. I merged up again from main and I will give it a more thorough review tomorrow |
🤖 |
🤖: Benchmark completed Details
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks in line with other parts of CSE:
datafusion/datafusion/optimizer/src/common_subexpr_eliminate.rs
Lines 374 to 396 in 602475f
// If there were common expressions extracted, then we need to | |
// make sure we restore the original column names. | |
// TODO: Although `find_common_exprs()` inserts aliases around | |
// extracted common expressions this doesn't mean that the | |
// original column names (schema) are preserved due to the | |
// inserted aliases are not always at the top of the | |
// expression. | |
// Let's consider improving `find_common_exprs()` to always | |
// keep column names and get rid of additional name | |
// preserving logic here. | |
if let Some(aggr_expr) = aggr_expr { | |
let name_preserver = NamePreserver::new_for_projection(); | |
let saved_names = aggr_expr | |
.iter() | |
.map(|expr| name_preserver.save(expr)) | |
.collect::<Vec<_>>(); | |
let new_aggr_expr = rewritten_aggr_expr | |
.into_iter() | |
.zip(saved_names) | |
.map(|(new_expr, saved_name)| { | |
saved_name.restore(new_expr) | |
}) | |
.collect::<Vec<Expr>>(); |
datafusion/datafusion/optimizer/src/common_subexpr_eliminate.rs
Lines 185 to 203 in 602475f
// If there were common expressions extracted, then we need to make sure | |
// we restore the original column names. | |
// TODO: Although `find_common_exprs()` inserts aliases around extracted | |
// common expressions this doesn't mean that the original column names | |
// (schema) are preserved due to the inserted aliases are not always at | |
// the top of the expression. | |
// Let's consider improving `find_common_exprs()` to always keep column | |
// names and get rid of additional name preserving logic here. | |
if let Some(window_expr_list) = window_expr_list { | |
let name_preserver = NamePreserver::new_for_projection(); | |
let saved_names = window_expr_list | |
.iter() | |
.map(|exprs| { | |
exprs | |
.iter() | |
.map(|expr| name_preserver.save(expr)) | |
.collect::<Vec<_>>() | |
}) | |
.collect::<Vec<_>>(); |
So I think this is good to merge; perhaps copy those comments above in with this fix for visibility.
(btw what does find_common_exprs()
refer to now; I can't grep that function 🤔 )
I merged up from main and plan to merge this PR once Ci passes. Thank you for your patience @Col-Waltz and for the revie @Jefffrey |
Which issue does this PR close?
Rationale for this change
Common_sub_expression_eliminate rule failed with error:
SchemaError(FieldNotFound {field: <name>}, valid_fields: []})
due to the schema being changed by the second application of
find_common_exprs
As I understood the source of the problem was in sequential call of
find_common_exprs
. First call returned original names asaggr_expr
and changed names as
new_aggr_expr
. Second call takes into accountonly
new_aggr_expr
and if names was already changed by first callwill return changed names as
aggr_expr
(original ones)and put them into Projection logic.
What changes are included in this PR?
I used existing NamePreserver mechanism to restore original schema names and
generate Projection with original name at the end of aggregate optimization.
Are these changes tested?
Yes this changes are tested and I added test to this commit.
Error emerges not only in PREPARE requests, I created SELECT request which
causes the same error.
Are there any user-facing changes?
No the change only fixes optimization rule.