Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: Partial fix for SQL aggregate queries with aliases #2464

Merged
merged 6 commits into from
May 6, 2022

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented May 6, 2022

Which issue does this PR close?

This is working towards a fix for #2430

Rationale for this change

See the added unit test for an example of a query that previously failed.

What changes are included in this PR?

  • Fixes code for finding columns referenced from expressions so that it now fully recurses through the expression tree
  • Removes aliases from the alias_map before parsing the grouping expressions so that the grouping expressions don't try and use aliased expressions from the projection
  • Added documentation for the planner aggregate method

Are there any user-facing changes?

No

@andygrove andygrove self-assigned this May 6, 2022
@github-actions github-actions bot added the datafusion Changes in the datafusion crate label May 6, 2022
async fn csv_query_group_by_with_aliases() -> Result<()> {
let ctx = SessionContext::new();
register_aggregate_csv(&ctx).await?;
let sql = "SELECT c1 AS c12, avg(c12) AS c1 FROM aggregate_test_100 GROUP BY c1";
Copy link
Member Author

@andygrove andygrove May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This originally failed with Aggregations require unique expression names but the expression \\\"AVG(#aggregate_test_100.c12)\\\" at position 0 and \\\"AVG(#aggregate_test_100.c12)\\\" at position 1 have the same name. Consider aliasing (\\\"AS\\\") one of them

pub(crate) fn find_column_exprs(exprs: &[Expr]) -> Vec<Expr> {
find_exprs_in_exprs(exprs, &|nested_expr| matches!(nested_expr, Expr::Column(_)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was only finding some expressions and was not recursing and finding them all

@andygrove andygrove changed the title [WIP] Partial fix for SQL aggregate queries with aliases MINOR: Partial fix for SQL aggregate queries with aliases May 6, 2022
Comment on lines +1013 to +1016
let mut alias_map = alias_map.clone();
for f in plan.schema().fields() {
alias_map.remove(f.name());
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first part of the fix

@andygrove andygrove marked this pull request as ready for review May 6, 2022 00:23
@andygrove
Copy link
Member Author

@alamb @yjshen @jdye64 I am working through quite a bit of technical debt in the SQL planner around aggregate queries and this is a small step towards fixing issues and also adds some documentation explaining how the code works. PTAL when you can.

@andygrove
Copy link
Member Author

Putting this back to draft for now - test is still failing

@andygrove andygrove marked this pull request as draft May 6, 2022 02:57
@andygrove andygrove changed the title MINOR: Partial fix for SQL aggregate queries with aliases WIP: MINOR: Partial fix for SQL aggregate queries with aliases May 6, 2022
@andygrove andygrove marked this pull request as ready for review May 6, 2022 03:17
@andygrove andygrove changed the title WIP: MINOR: Partial fix for SQL aggregate queries with aliases MINOR: Partial fix for SQL aggregate queries with aliases May 6, 2022
@andygrove
Copy link
Member Author

This is ready for review now

@andygrove
Copy link
Member Author

Thanks for the review @yjshen

@andygrove andygrove merged commit 22464f0 into apache:master May 6, 2022
@andygrove andygrove deleted the alias-fix branch May 6, 2022 15:13
@@ -251,6 +251,77 @@ pub enum Expr {
QualifiedWildcard { qualifier: String },
}

/// Recursively find all columns referenced by an expression
pub fn find_columns_referenced_by_expr(e: &Expr) -> Vec<Column> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code could use an ExprVisitor to avoid having to do the recursion itself (and potentially missing some case)

I took a crack at doing so #2471

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants