Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in subquery join filters referencing outer query #2416

Merged

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented May 2, 2022

Which issue does this PR close?

Closes #2415

Rationale for this change

Fixes a bug.

What changes are included in this PR?

The issue was that the SQL parser was delegating to the LogicalPlanBuilder to build a filter expression and the LogicalPlanBuilder is not aware of subqueries and outer query schemas, so it failed to resolve the join expression that referenced the outer query schema.

The SQL planner now just creates the Filter expression directly.

Are there any user-facing changes?

No

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label May 2, 2022
Comment on lines 892 to 895
Some(filter_expr) => Ok(LogicalPlan::Filter(Filter {
predicate: filter_expr,
input: Arc::new(left),
})),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bug fix. We cannot delegate to the LogicalPlanBuilder to build this Filter expression because it is not subquery-aware.

@andygrove
Copy link
Member Author

@alamb @Dandandan @yjshen This is ready for review. I'm not sure why we're normalizing column names when building the logical plan rather than during optimization but this is the approach that the logical plan builder was already taking so I just kept things consistent to avoid any test regressions.

@andygrove andygrove merged commit c42dc82 into apache:master May 3, 2022
@andygrove andygrove deleted the subquery-join-filter-reference-outer-query branch May 3, 2022 17:59
@alamb
Copy link
Contributor

alamb commented May 4, 2022

I'm not sure why we're normalizing column names when building the logical plan rather than during optimization but this is the approach that the logical plan builder was already taking so I just kept things consistent to avoid any test regressions.

I think this was part of @houqp 's work (maybe in #1023) to properly handle multiple relations in a query

let sql = "SELECT j1_string, j2_string \
FROM j1, j2 \
WHERE j1_id = j2_id - 1 \
AND j2_id < (SELECT count(*) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh so tricky 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scalar subquery can fail to reference unqualified field from outer query
3 participants