Skip to content

Conversation

@hvanhovell
Copy link
Contributor

@hvanhovell hvanhovell commented Nov 28, 2016

What changes were proposed in this pull request?

We currently push down join conditions of a Left Anti join to both sides of the join. This is similar to Inner, Left Semi and Existence (a specialized left semi) join. The problem is that this changes the semantics of the join; a left anti join filters out rows that matches the join condition.

This PR fixes this by only pushing down conditions to the left hand side of the join. This is similar to the behavior of left outer join.

How was this patch tested?

Added tests to FilterPushdownSuite.scala and created a SQLQueryTestSuite file for left anti joins with a regression test.

@hvanhovell
Copy link
Contributor Author

cc @dongjoon-hyun could you take a look at this?

@SparkQA
Copy link

SparkQA commented Nov 28, 2016

Test build #69211 has finished for PR 16026 at commit 4d6b183.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

comparePlans(optimized, analysis.EliminateSubqueryAliases(correctAnswer))
}

test("joins: only push down to the right of a left anti join") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about joins: only push down join conditions to the right of a left anti join?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a JIRA issue number here?

@gatorsmile
Copy link
Member

The current PR title is a little bit confusing. Could you please update the PR title from push-down predicates to push-down join conditions?

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @hvanhovell . I'll take a look, too.

@gatorsmile
Copy link
Member

LGTM except a minor comment.

@dongjoon-hyun
Copy link
Member

LGTM, @hvanhovell .

@hvanhovell hvanhovell changed the title [SPARK-18597][SQL] Do push-down predicates to right side for a LEFT ANTI join [SPARK-18597][SQL] Do not push-down join conditions to the right side for a LEFT ANTI join Nov 28, 2016
@hvanhovell hvanhovell changed the title [SPARK-18597][SQL] Do not push-down join conditions to the right side for a LEFT ANTI join [SPARK-18597][SQL] Do not push-down join conditions to the right side of a LEFT ANTI join Nov 28, 2016
@SparkQA
Copy link

SparkQA commented Nov 28, 2016

Test build #69242 has finished for PR 16026 at commit 47bbdff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor Author

I am merging this to master/2.1. Thanks for the reviews!

asfgit pushed a commit that referenced this pull request Nov 28, 2016
… of a LEFT ANTI join

## What changes were proposed in this pull request?
We currently push down join conditions of a Left Anti join to both sides of the join. This is similar to Inner, Left Semi and Existence (a specialized left semi) join. The problem is that this changes the semantics of the join; a left anti join filters out rows that matches the join condition.

This PR fixes this by only pushing down conditions to the left hand side of the join. This is similar to the behavior of left outer join.

## How was this patch tested?
Added tests to `FilterPushdownSuite.scala` and created a SQLQueryTestSuite file for left anti joins with a regression test.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes #16026 from hvanhovell/SPARK-18597.

(cherry picked from commit 38e2982)
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
@asfgit asfgit closed this in 38e2982 Nov 28, 2016
hvanhovell added a commit to hvanhovell/spark that referenced this pull request Nov 28, 2016
… of a LEFT ANTI join

## What changes were proposed in this pull request?
We currently push down join conditions of a Left Anti join to both sides of the join. This is similar to Inner, Left Semi and Existence (a specialized left semi) join. The problem is that this changes the semantics of the join; a left anti join filters out rows that matches the join condition.

This PR fixes this by only pushing down conditions to the left hand side of the join. This is similar to the behavior of left outer join.

## How was this patch tested?
Added tests to `FilterPushdownSuite.scala` and created a SQLQueryTestSuite file for left anti joins with a regression test.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes apache#16026 from hvanhovell/SPARK-18597.

(cherry picked from commit 38e2982)
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

joinType match {
case _: InnerLike | LeftExistence(_) =>
case _: InnerLike | LeftSemi | ExistenceJoin(_) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics of ExistenceJoin says we need to preserve all the rows from the left table through the join operation as if it is a regular LeftOuter join. The ExistenceJoin augments the LeftOuter operation with a new column called exists, set to true when the join condition in the ON clause is true and false otherwise. The filter of any rows will happen in the Filter operation above the ExistenceJoin.

Example:

A(c1, c2):
{ (1, 1), (1, 2) }

B(c1):
{ (NULL) }

// can be any value as it is irrelevant in this example

select A.*
from   A
where  exists (select 1 from B where A.c1 = A.c2)
       or A.c2=2

In this example, the correct result is all the rows from A. If the pattern ExistenceJoin at line 935 in Optimizer.scala added by the PR of this JIRA is indeed active, the code will push down the predicate A.c1 = A.c2 to be a Filter on relation A, which will filter the row (1,2) from A.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsyca Have you tried the above example using the latest master?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have included the full comment from the JIRA. The keyword is "this is not currently exposed". Here is the first part of my comment:

"ExistenceJoin should be treated the same as LeftOuter and LeftAnti, not InnerLike and LeftSemi. This is not currently exposed because the rewrite of [NOT] EXISTS OR ... to ExistenceJoin happens in rule RewritePredicateSubquery, which is in a separate rule set and placed after the rule PushPredicateThroughJoin. During the transformation in the rule PushPredicateThroughJoin, an ExistenceJoin never exists."

asfgit pushed a commit that referenced this pull request Nov 28, 2016
…of a Left Anti join [BRANCH-2.0]

## What changes were proposed in this pull request?
We currently push down join conditions of a Left Anti join to both sides of the join. This is similar to Inner, Left Semi and Existence (a specialized left semi) join. The problem is that this changes the semantics of the join; a left anti join filters out rows that matches the join condition.

This PR fixes this by only pushing down conditions to the right hand side of the join. This is similar to the behavior of left outer join.

This PR is a backport of #16026

## How was this patch tested?
Added tests to `FilterPushdownSuite.scala` and created a SQLQueryTestSuite file for left anti joins with a regression test.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes #16039 from hvanhovell/SPARK-18597-branch-2.0.
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
… of a LEFT ANTI join

## What changes were proposed in this pull request?
We currently push down join conditions of a Left Anti join to both sides of the join. This is similar to Inner, Left Semi and Existence (a specialized left semi) join. The problem is that this changes the semantics of the join; a left anti join filters out rows that matches the join condition.

This PR fixes this by only pushing down conditions to the left hand side of the join. This is similar to the behavior of left outer join.

## How was this patch tested?
Added tests to `FilterPushdownSuite.scala` and created a SQLQueryTestSuite file for left anti joins with a regression test.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes apache#16026 from hvanhovell/SPARK-18597.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
… of a LEFT ANTI join

## What changes were proposed in this pull request?
We currently push down join conditions of a Left Anti join to both sides of the join. This is similar to Inner, Left Semi and Existence (a specialized left semi) join. The problem is that this changes the semantics of the join; a left anti join filters out rows that matches the join condition.

This PR fixes this by only pushing down conditions to the left hand side of the join. This is similar to the behavior of left outer join.

## How was this patch tested?
Added tests to `FilterPushdownSuite.scala` and created a SQLQueryTestSuite file for left anti joins with a regression test.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes apache#16026 from hvanhovell/SPARK-18597.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants