-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix Planner
don't generate SubqueryAlias
and generate duplicated SubqueryAlias
#4484
Conversation
Sort: custdist DESC NULLS FIRST, c_orders.c_count DESC NULLS FIRST | ||
Projection: c_orders.c_count, COUNT(UInt8(1)) AS custdist | ||
Aggregate: groupBy=[[c_orders.c_count]], aggr=[[COUNT(UInt8(1))]] | ||
SubqueryAlias: c_orders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below already existed SubqueryAlias: c_orders
datafusion/sql/src/planner.rs
Outdated
(Some(cte_plan), _) => match table_alias { | ||
Some(cte_alias) => { | ||
Ok(with_alias(cte_plan.clone(), cte_alias)) | ||
} | ||
_ => Ok(cte_plan.clone()), | ||
}, | ||
(Some(cte_plan), _) => Ok(cte_plan.clone()), | ||
(_, Ok(provider)) => { | ||
let scan = | ||
LogicalPlanBuilder::scan(&table_name, provider, None); | ||
let scan = match table_alias.as_ref() { | ||
Some(ref name) => scan?.alias(name.to_owned().as_str()), | ||
_ => scan, | ||
}; | ||
scan?.build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must remove these code, because they will add SubqueryAlias
.
But following code will be added once again, it will cause duplicated SubqueryAlias
datafusion/sql/src/planner.rs
Outdated
|
||
let plan = match normalized_alias { | ||
Some(alias) => with_alias(logical_plan, alias), | ||
_ => logical_plan, | ||
}; | ||
(plan, alias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must remove these code, because they will add SubqueryAlias.
But following code will be added once again, it will cause duplicated SubqueryAlias
let columns_alias = alias.clone().columns; | ||
if columns_alias.is_empty() { | ||
// sqlparser-rs encodes AS t as an empty list of column alias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original code is bug (add twice) + bug (when columns-alias is empty, forget to add alias). let me have explain.
The process is as follows
- Add table-alias (those code that are removed above)
- Add table-alias and columns-alias (these code here)
- bug: when columns-alias is empty, these code will ignore table-alias
So
- bug (add twice) will cause duplicated subquery_alias
- bug (when columns-alias, forget to add alias) will cause missing subquery_alias
so we just unify them into
- Add table-alias and columns-alias
" EmptyRelation []", | ||
" Projection: Int64(7) AS a, Utf8(\"bb\") AS b [a:Int64, b:Utf8]", | ||
" EmptyRelation []", | ||
" SubqueryAlias: _sample_data [a:Int64, b:Utf8]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing SubqueryAlias
Planner
don't generate SubqueryAlias
and generate replicated SubqueryAlias
Planner
don't generate SubqueryAlias
and generate duplicated SubqueryAlias
pub fn subquery_alias_owned(plan: LogicalPlan, alias: &str) -> Result<LogicalPlan> { | ||
let schema: Schema = plan.schema().as_ref().clone().into(); | ||
let schema = DFSchemaRef::new(DFSchema::try_from_qualified_schema(alias, &schema)?); | ||
Ok(LogicalPlan::SubqueryAlias(SubqueryAlias { | ||
input: Arc::new(plan), | ||
alias, | ||
schema: Arc::new(schema), | ||
}) | ||
alias: alias.to_string(), | ||
schema, | ||
})) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this logic into SubqueryAlias::try_new
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has added it.
b5160b1
to
ebdf17b
Compare
I still think the Spark SQL Output:
|
Test SQL:
|
Current this PR is just for fix bug and don't include any optimization. we can see Spark exist analyzed plan
But |
To remove the
|
If we want to do this, I think we may need to add
This job will be complicated. I will try to use the solution of |
Just curious, does Apache Doris have an explicit binder/analyzer phase ? |
Yes, Doris has analyzer in here |
abc1c64
to
6ae1ae7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found https://github.com/apache/arrow-datafusion/pull/4484/files?w=1 easier to review
The changes to the plans look good to me -- thank you @jackwener
@mingmwang do you have any concerns about merging this PR?
Thanks again @jackwener -- this is great stuff |
I plan to merge this later today (in about 9 hours) unless I hear otherwise |
No, this PR is good to merge I think. |
Oh no -- this PR again has conflicts 😢 @jackwener can you possibly rebase it one more time ? |
6ae1ae7
to
ca39a16
Compare
has resolved conflict. |
Thanks again @jackwener -- sorry this one took so long to get in |
Which issue does this PR close?
Closes #4483
Closes #4454
Closes #4481
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?