Skip to content

Conversation

@dilipbiswal
Copy link
Contributor

What changes were proposed in this pull request?

Currently the set operations INTERSECT, UNION and EXCEPT are assigned the same precedence. This PR fixes the problem by giving INTERSECT higher precedence than UNION and EXCEPT. UNION and EXCEPT operators are evaluated in the order in which they appear in the query from left to right.

This results in change in behavior because of the change in order of evaluations of set operators in a query. The old behavior is still preserved under a newly added config parameter.

Query :

SELECT * FROM t1
UNION 
SELECT * FROM t2
EXCEPT
SELECT * FROM t3
INTERSECT
SELECT * FROM t4

Parsed plan before the change :

== Parsed Logical Plan ==
'Intersect false
:- 'Except false
:  :- 'Distinct
:  :  +- 'Union
:  :     :- 'Project [*]
:  :     :  +- 'UnresolvedRelation `t1`
:  :     +- 'Project [*]
:  :        +- 'UnresolvedRelation `t2`
:  +- 'Project [*]
:     +- 'UnresolvedRelation `t3`
+- 'Project [*]
   +- 'UnresolvedRelation `t4`

Parsed plan after the change :

== Parsed Logical Plan ==
'Except false
:- 'Distinct
:  +- 'Union
:     :- 'Project [*]
:     :  +- 'UnresolvedRelation `t1`
:     +- 'Project [*]
:        +- 'UnresolvedRelation `t2`
+- 'Intersect false
   :- 'Project [*]
   :  +- 'UnresolvedRelation `t3`
   +- 'Project [*]
      +- 'UnresolvedRelation `t4`

How was this patch tested?

Added tests in PlanParserSuite, SQLQueryTestSuite.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@holdensmagicalunicorn
Copy link

@dilipbiswal, thanks! I am a bot who has found some folks who might be able to help with the review:@gatorsmile, @rxin and @hvanhovell

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When true, INTERSECT is given precedence over UNION and EXCEPT set operations as per

->

When true, INTERSECT is given the greater precedence over the other set operations (UNION, EXCEPT and MINUS) as per

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add withSQLConf(SQLConf.SETOPS_PRECEDENCE_ENFORCED.key -> "true") {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also include MINUS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me think about the name of conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile A couple of alternatives i had considered :-). FYI.
spark.sql.set.operators.precedence.enforced
spark.sql.set.operators.standard.compliance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.sql.legacy.setopsPrecedence.enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to the current PR. This addresses a comment from @HyukjinKwon in 21886

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to the current PR. This addresses a comment from @HyukjinKwon in 21886

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to the current PR. This addresses a comment from @HyukjinKwon in 21886

@SparkQA
Copy link

SparkQA commented Aug 1, 2018

Test build #93871 has finished for PR 21941 at commit c0821b6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2018

Test build #93872 has finished for PR 21941 at commit 47cbc5a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us mark it internal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #93917 has finished for PR 21941 at commit e7d69db.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #93936 has finished for PR 21941 at commit e7d69db.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

@maropu Is this a transient failure ? Does not seem related to my change ?

@maropu
Copy link
Member

maropu commented Aug 2, 2018

no idea, but HiveClientSuites seems flaky: https://issues.apache.org/jira/browse/SPARK-23622 (the error message is different though...)

@maropu
Copy link
Member

maropu commented Aug 2, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #93966 has finished for PR 21941 at commit e7d69db.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #93989 has finished for PR 21941 at commit e7d69db.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #94003 has finished for PR 21941 at commit e7d69db.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #94028 has finished for PR 21941 at commit e7d69db.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

@gatorsmile rebasing in a hope that the test result changes :-)

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #94046 has finished for PR 21941 at commit bc6df75.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #94053 has finished for PR 21941 at commit bc6df75.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

Thanks! Merged to master.

@asfgit asfgit closed this in 73dd6cf Aug 3, 2018
@dilipbiswal
Copy link
Contributor Author

Thanks a lot @gatorsmile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants