Skip to content

Conversation

@hvanhovell
Copy link
Contributor

What changes were proposed in this pull request?

Join processing in the parser relies on the fact that the grammar produces a right nested trees, for instance the parse tree for select * from a join b join c is expected to produce a tree similar to JOIN(a, JOIN(b, c)). However there are cases in which this (invariant) is violated, like:

SELECT COUNT(1)
FROM test T1 
     CROSS JOIN test T2
     JOIN test T3
      ON T3.col = T1.col
     JOIN test T4
      ON T4.col = T1.col

In this case the parser returns a tree in which Joins are located on both the left and the right sides of the parent join node.

This PR introduces a different grammar rule which does not make this assumption. The new rule takes a relation and searches for zero or more joined relations. As a bonus processing is much easier.

How was this patch tested?

Existing tests and I have added a regression test to the plan parser suite.

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@SparkQA
Copy link

SparkQA commented Aug 29, 2016

Test build #64596 has finished for PR 14867 at commit f9cb0d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

joinRelation
: joinType JOIN right=relationPrimary joinCriteria?
| NATURAL joinType JOIN right=relationPrimary
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NATURAL CROSS JOIN is invalid, so perhaps we should not include CROSS in joinType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a point there. Let me update that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to move the code around. I have added a check in the AstBuilder (spark side of the parser) to catch this.

@SparkQA
Copy link

SparkQA commented Sep 2, 2016

Test build #64849 has finished for PR 14867 at commit e20afbd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell hvanhovell changed the title [SPARK-17296][SQL] Simplify join parser join processing. [SPARK-17296][SQL] Simplif parser join processing. Sep 2, 2016
@hvanhovell hvanhovell changed the title [SPARK-17296][SQL] Simplif parser join processing. [SPARK-17296][SQL] Simplify parser join processing. Sep 2, 2016
@SparkQA
Copy link

SparkQA commented Sep 2, 2016

Test build #64853 has finished for PR 14867 at commit b09d506.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor Author

cc @srinathshankar


// SPARK-17296
assertEqual(
"select * from t1 cross join t2 join t3 on t3.id = t1.id join t4 on t4.id = t1.id",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is something like
SELECT * FROM T1 INNER JOIN T2 INNER JOIN T3 ON col3 = col2 ON col3 = col1;
supposed to parse ?
Without your change it returns the following error:
org.apache.spark.sql.AnalysisException: cannot resolve 'col3' given input columns: [col1, col2]; line 1 pos 63
which I don't understand. The following parses though:
SELECT * FROM T1 INNER JOIN T2 INNER JOIN T3 ON col1 = col2 ON col2 = col1
and returns a result

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, it looks like your patch will disallow both queries at the parser level. Could you add a test that enforces this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I have added a test.

.select(star()))

// Test multiple on clauses.
intercept("select * from t1 inner join t2 inner join t3 on col3 = col2 on col3 = col1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's also add a test somewhere for
SELECT * FROM T1 INNER JOIN (T2 INNER JOIN T3 ON col3 = col2) ON col3 = col1
SELECT * FROM T1 INNER JOIN (T2 INNER JOIN T3) ON col3 = col2
SELECT * FROM T1 INNER JOIN (T2 INNER JOIN T3 ON col3 = col2)

This looks good to me.

# Conflicts:
#	sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@SparkQA
Copy link

SparkQA commented Sep 3, 2016

Test build #64880 has finished for PR 14867 at commit 3b13cd7.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 3, 2016

Test build #64882 has finished for PR 14867 at commit fca4489.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 5, 2016

Test build #64926 has finished for PR 14867 at commit c30e665.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.select(star()))

// Implicit joins.
assertEqual(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, LGTM

@hvanhovell
Copy link
Contributor Author

Merging to master/2.0. Thanks for the review!

@asfgit asfgit closed this in 4f769b9 Sep 6, 2016
asfgit pushed a commit that referenced this pull request Sep 7, 2016
## What changes were proposed in this pull request?
This PR backports #14867 to branch-2.0. It fixes a number of join ordering bugs.
## How was this patch tested?
Added tests to `PlanParserSuite`.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes #14984 from hvanhovell/SPARK-17296-branch-2.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants