-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17251][SQL] Support OuterReference in projection list of IN correlated subqueries
#16012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -356,10 +356,17 @@ case class PrettyAttribute( | |
| * A place holder used to hold a reference that has been resolved to a field outside of the current | ||
| * plan. This is used for correlated subqueries. | ||
| */ | ||
| case class OuterReference(e: NamedExpression) extends LeafExpression with Unevaluable { | ||
| case class OuterReference(e: NamedExpression)( | ||
| val exprId: ExprId = NamedExpression.newExprId) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it okay? I thought it works like 'Alias'. Anyway, no problem. I'll update like that. |
||
| extends LeafExpression with NamedExpression with Unevaluable { | ||
| override def dataType: DataType = e.dataType | ||
| override def nullable: Boolean = e.nullable | ||
| override def prettyName: String = "outer" | ||
|
|
||
| override def name: String = e.name | ||
| override def qualifier: Option[String] = e.qualifier | ||
| override def toAttribute: Attribute = e.toAttribute | ||
| override def newInstance(): NamedExpression = OuterReference(e)() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. |
||
| } | ||
|
|
||
| object VirtualColumn { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| CREATE TEMPORARY VIEW t1 AS SELECT * FROM VALUES 1, 2 AS t1(a); | ||
|
|
||
| CREATE TEMPORARY VIEW t2 AS SELECT * FROM VALUES 1 AS t2(b); | ||
|
|
||
| -- IN with correlated predicate | ||
| SELECT a FROM t1 WHERE a IN (SELECT b FROM t2 WHERE a=b); | ||
|
|
||
| -- NOT IN with correlated predicate | ||
| SELECT a FROM t1 WHERE a NOT IN (SELECT b FROM t2 WHERE a=b); | ||
|
|
||
| -- IN with correlated projection | ||
| SELECT a FROM t1 WHERE a IN (SELECT a FROM t2); | ||
|
|
||
| -- IN with correlated projection | ||
| SELECT a FROM t1 WHERE a NOT IN (SELECT a FROM t2); | ||
|
|
||
| -- IN with expressions | ||
| SELECT a FROM t1 WHERE a*1 IN (SELECT a%2 FROM t2); |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| -- Automatically generated by SQLQueryTestSuite | ||
| -- Number of queries: 7 | ||
|
|
||
|
|
||
| -- !query 0 | ||
| CREATE TEMPORARY VIEW t1 AS SELECT * FROM VALUES 1, 2 AS t1(a) | ||
| -- !query 0 schema | ||
| struct<> | ||
| -- !query 0 output | ||
|
|
||
|
|
||
|
|
||
| -- !query 1 | ||
| CREATE TEMPORARY VIEW t2 AS SELECT * FROM VALUES 1 AS t2(b) | ||
| -- !query 1 schema | ||
| struct<> | ||
| -- !query 1 output | ||
|
|
||
|
|
||
|
|
||
| -- !query 2 | ||
| SELECT a FROM t1 WHERE a IN (SELECT b FROM t2 WHERE a=b) | ||
| -- !query 2 schema | ||
| struct<a:int> | ||
| -- !query 2 output | ||
| 1 | ||
|
|
||
|
|
||
| -- !query 3 | ||
| SELECT a FROM t1 WHERE a NOT IN (SELECT b FROM t2 WHERE a=b) | ||
| -- !query 3 schema | ||
| struct<a:int> | ||
| -- !query 3 output | ||
| 2 | ||
|
|
||
|
|
||
| -- !query 4 | ||
| SELECT a FROM t1 WHERE a IN (SELECT a FROM t2) | ||
| -- !query 4 schema | ||
| struct<a:int> | ||
| -- !query 4 output | ||
| 1 | ||
| 2 | ||
|
|
||
|
|
||
| -- !query 5 | ||
| SELECT a FROM t1 WHERE a NOT IN (SELECT a FROM t2) | ||
| -- !query 5 schema | ||
| struct<a:int> | ||
| -- !query 5 output | ||
|
|
||
|
|
||
|
|
||
| -- !query 6 | ||
| SELECT a FROM t1 WHERE a*1 IN (SELECT a%2 FROM t2) | ||
| -- !query 6 schema | ||
| struct<a:int> | ||
| -- !query 6 output | ||
| 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the analyzer change has the desired effect. This just remove the outer reference from the tree, and this won't work if we use the attribute anywhere in the tree. For example:
I think we need to break this down into two steps:
cc @nsyca what do you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Correct. I'll check that again.
BTW, What about the predicates? It felt the predicates are handled the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not looked at the code changes closely but got a general idea of what the originally reported problem is. I second @hvanhovell to not support outer reference in a SELECT clause of a subquery in 2.1. Just fix the named expression first.
IN subquery might be okay as it reflects the inner join semantics more or less. NOT IN subquery is converted to a special case of an anti-join with extra logic for the null value.
Does the LeftAnti with effectively no join predicate, i.e.,
(isnull(tbl_a.c1 = tbl_a.c2) || (tbl_a.c1 = tbl_a.c2))work correctly today? And if it returns a correct result, is it by design, not by chance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another interesting case to consider:
If we support correlated columns in SELECT clause, do we build the Aggregate on T2 or T1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
LEFT ANTIjoin should produce the correct result. Unfortunately we push down thetbl_a.c1 = tbl_a.c2expression into thetbl_aside of the plan. So we need to fix this. I have created https://issues.apache.org/jira/browse/SPARK-18597 to track this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @hvanhovell .
BTW, may I rebase this PR and try to the second plan for 2.2 here?