-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18597][SQL] Do not push-down join conditions to the right side of a LEFT ANTI join #16026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @dongjoon-hyun could you take a look at this? |
|
Test build #69211 has finished for PR 16026 at commit
|
| comparePlans(optimized, analysis.EliminateSubqueryAliases(correctAnswer)) | ||
| } | ||
|
|
||
| test("joins: only push down to the right of a left anti join") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about joins: only push down join conditions to the right of a left anti join?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a JIRA issue number here?
|
The current PR title is a little bit confusing. Could you please update the PR title from |
|
Thank you for pinging me, @hvanhovell . I'll take a look, too. |
|
LGTM except a minor comment. |
|
LGTM, @hvanhovell . |
|
Test build #69242 has finished for PR 16026 at commit
|
|
I am merging this to master/2.1. Thanks for the reviews! |
… of a LEFT ANTI join ## What changes were proposed in this pull request? We currently push down join conditions of a Left Anti join to both sides of the join. This is similar to Inner, Left Semi and Existence (a specialized left semi) join. The problem is that this changes the semantics of the join; a left anti join filters out rows that matches the join condition. This PR fixes this by only pushing down conditions to the left hand side of the join. This is similar to the behavior of left outer join. ## How was this patch tested? Added tests to `FilterPushdownSuite.scala` and created a SQLQueryTestSuite file for left anti joins with a regression test. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #16026 from hvanhovell/SPARK-18597. (cherry picked from commit 38e2982) Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
… of a LEFT ANTI join ## What changes were proposed in this pull request? We currently push down join conditions of a Left Anti join to both sides of the join. This is similar to Inner, Left Semi and Existence (a specialized left semi) join. The problem is that this changes the semantics of the join; a left anti join filters out rows that matches the join condition. This PR fixes this by only pushing down conditions to the left hand side of the join. This is similar to the behavior of left outer join. ## How was this patch tested? Added tests to `FilterPushdownSuite.scala` and created a SQLQueryTestSuite file for left anti joins with a regression test. Author: Herman van Hovell <hvanhovell@databricks.com> Closes apache#16026 from hvanhovell/SPARK-18597. (cherry picked from commit 38e2982) Signed-off-by: Herman van Hovell <hvanhovell@databricks.com> # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
|
|
||
| joinType match { | ||
| case _: InnerLike | LeftExistence(_) => | ||
| case _: InnerLike | LeftSemi | ExistenceJoin(_) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics of ExistenceJoin says we need to preserve all the rows from the left table through the join operation as if it is a regular LeftOuter join. The ExistenceJoin augments the LeftOuter operation with a new column called exists, set to true when the join condition in the ON clause is true and false otherwise. The filter of any rows will happen in the Filter operation above the ExistenceJoin.
Example:
A(c1, c2):
{ (1, 1), (1, 2) }
B(c1):
{ (NULL) }
// can be any value as it is irrelevant in this example
select A.*
from A
where exists (select 1 from B where A.c1 = A.c2)
or A.c2=2
In this example, the correct result is all the rows from A. If the pattern ExistenceJoin at line 935 in Optimizer.scala added by the PR of this JIRA is indeed active, the code will push down the predicate A.c1 = A.c2 to be a Filter on relation A, which will filter the row (1,2) from A.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nsyca Have you tried the above example using the latest master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have included the full comment from the JIRA. The keyword is "this is not currently exposed". Here is the first part of my comment:
"ExistenceJoin should be treated the same as LeftOuter and LeftAnti, not InnerLike and LeftSemi. This is not currently exposed because the rewrite of [NOT] EXISTS OR ... to ExistenceJoin happens in rule RewritePredicateSubquery, which is in a separate rule set and placed after the rule PushPredicateThroughJoin. During the transformation in the rule PushPredicateThroughJoin, an ExistenceJoin never exists."
…of a Left Anti join [BRANCH-2.0] ## What changes were proposed in this pull request? We currently push down join conditions of a Left Anti join to both sides of the join. This is similar to Inner, Left Semi and Existence (a specialized left semi) join. The problem is that this changes the semantics of the join; a left anti join filters out rows that matches the join condition. This PR fixes this by only pushing down conditions to the right hand side of the join. This is similar to the behavior of left outer join. This PR is a backport of #16026 ## How was this patch tested? Added tests to `FilterPushdownSuite.scala` and created a SQLQueryTestSuite file for left anti joins with a regression test. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #16039 from hvanhovell/SPARK-18597-branch-2.0.
… of a LEFT ANTI join ## What changes were proposed in this pull request? We currently push down join conditions of a Left Anti join to both sides of the join. This is similar to Inner, Left Semi and Existence (a specialized left semi) join. The problem is that this changes the semantics of the join; a left anti join filters out rows that matches the join condition. This PR fixes this by only pushing down conditions to the left hand side of the join. This is similar to the behavior of left outer join. ## How was this patch tested? Added tests to `FilterPushdownSuite.scala` and created a SQLQueryTestSuite file for left anti joins with a regression test. Author: Herman van Hovell <hvanhovell@databricks.com> Closes apache#16026 from hvanhovell/SPARK-18597.
… of a LEFT ANTI join ## What changes were proposed in this pull request? We currently push down join conditions of a Left Anti join to both sides of the join. This is similar to Inner, Left Semi and Existence (a specialized left semi) join. The problem is that this changes the semantics of the join; a left anti join filters out rows that matches the join condition. This PR fixes this by only pushing down conditions to the left hand side of the join. This is similar to the behavior of left outer join. ## How was this patch tested? Added tests to `FilterPushdownSuite.scala` and created a SQLQueryTestSuite file for left anti joins with a regression test. Author: Herman van Hovell <hvanhovell@databricks.com> Closes apache#16026 from hvanhovell/SPARK-18597.
What changes were proposed in this pull request?
We currently push down join conditions of a Left Anti join to both sides of the join. This is similar to Inner, Left Semi and Existence (a specialized left semi) join. The problem is that this changes the semantics of the join; a left anti join filters out rows that matches the join condition.
This PR fixes this by only pushing down conditions to the left hand side of the join. This is similar to the behavior of left outer join.
How was this patch tested?
Added tests to
FilterPushdownSuite.scalaand created a SQLQueryTestSuite file for left anti joins with a regression test.