Skip to content

Conversation

@clockfly
Copy link
Contributor

@clockfly clockfly commented Aug 12, 2016

What changes were proposed in this pull request?

This PR adds expression UnresolvedOrdinal to represent the ordinal in GROUP BY or ORDER BY, and fixes the rules when resolving ordinals.

Ordinals in GROUP BY or ORDER BY like 1 in order by 1 or group by 1 should be considered as unresolved before analysis. But in current code, it uses Literal expression to store the ordinal. This is inappropriate as Literal itself is a resolved expression, it gives the user a wrong message that the ordinals has already been resolved.

Before this change

Ordinal is stored as Literal expression

scala> sc.setLogLevel("TRACE")
scala> sql("select a from t group by 1 order by 1")
...
'Sort [1 ASC], true  
 +- 'Aggregate [1], ['a]
     +- 'UnresolvedRelation `t

For query:

scala> Seq(1).toDF("a").createOrReplaceTempView("t")
scala> sql("select count(a), a from t group by 2 having a > 0").show

During analysis, the intermediate plan before applying rule ResolveAggregateFunctions is:

'Filter ('a > 0)
   +- Aggregate [2], [count(1) AS count(1)#83L, a#81]
        +- LocalRelation [value#7 AS a#9]

Before this PR, rule ResolveAggregateFunctions believes all expressions of Aggregate have already been resolved, and tries to resolve the expressions in Filter directly. But this is wrong, as ordinal 2 in Aggregate is not really resolved!

After this change

Ordinals are stored as UnresolvedOrdinal.

scala> sc.setLogLevel("TRACE")
scala> sql("select a from t group by 1 order by 1")
...
'Sort [unresolvedordinal(1) ASC], true
 +- 'Aggregate [unresolvedordinal(1)], ['a]
      +- 'UnresolvedRelation `t`

How was this patch tested?

Unit tests.

@SparkQA
Copy link

SparkQA commented Aug 12, 2016

Test build #63655 has finished for PR 14616 at commit 4087365.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • class OrderByOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan]
    • class GroupByOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan]
    • case class OrderByOrdinal(ordinal: Int)
    • case class GroupByOrdinal(ordinal: Int)

@clockfly
Copy link
Contributor Author

Ping @cloud-fan

@rxin
Copy link
Contributor

rxin commented Aug 12, 2016

Is there a way to fix this without introducing the new unresolved ordinal classes?

If there isn't, can we combine the two into a single UnresolvedOrdinal, and also combine the two analysis rule into a single FindUnresolvedOrdinals rule?

also cc @hvanhovell

@clockfly
Copy link
Contributor Author

@rxin
I have to introdce a new unresolved expression as "group by ordinal" and "order by ordinal" can be turned off by a config like "conf.groupByOrdinal" or "conf.orderByOrdinal".

Yes, UnresolvedOrdinal is a good idea.

@cloud-fan
Copy link
Contributor

Can we just add a check in ResolveAggregateFunctions? If groupByOrdinal is enabled and aggregate expressions contains integral literals, we skip it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we end up doing it this way, move this to its own file, and create an invididual test suite.

analyzer file is getting too large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@clockfly
Copy link
Contributor Author

@cloud-fan
ResolveAggregateFunctions is only one reference of Aggregate, there can be other references? If not now, will we add some rule to reference Aggregate in future? The case here is that Aggregate itself is unresolved when it contains group by ordinal. The current fix is fail-proof for all potential usages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a way to move the line position information.

@cloud-fan
Copy link
Contributor

The bug described in JIRA is already fixed by #14595 by accident...

shall we open a new JIRA as this PR is actually a refactor/cleanup?

@SparkQA
Copy link

SparkQA commented Aug 12, 2016

Test build #63670 has finished for PR 14616 at commit 9fc1b47.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class OrderByOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan]
    • class GroupByOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan]
    • case class OrderByOrdinal(ordinal: Int)
    • case class GroupByOrdinal(ordinal: Int)

@clockfly clockfly force-pushed the spark-16955 branch 2 times, most recently from 041184b to c82adad Compare August 12, 2016 09:35
@SparkQA
Copy link

SparkQA commented Aug 12, 2016

Test build #63678 has finished for PR 14616 at commit 094ac83.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan]
    • case class UnresolvedOrdinal(ordinal: Int)

@clockfly clockfly force-pushed the spark-16955 branch 4 times, most recently from 2412fa9 to f800463 Compare August 12, 2016 10:08
@SparkQA
Copy link

SparkQA commented Aug 12, 2016

Test build #63685 has finished for PR 14616 at commit 64ff518.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@clockfly clockfly changed the title [SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or GROUP BY [SPARK-17034][SQL] adds expression UnresolvedOrdinal to represent the ordinals in GROUP BY or ORDER BY Aug 12, 2016
@SparkQA
Copy link

SparkQA commented Aug 12, 2016

Test build #63683 has finished for PR 14616 at commit c82adad.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan]
    • case class UnresolvedOrdinal(ordinal: Int)

@SparkQA
Copy link

SparkQA commented Aug 12, 2016

Test build #63684 has finished for PR 14616 at commit 3fe8982.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan]
    • case class UnresolvedOrdinal(ordinal: Int)

@SparkQA
Copy link

SparkQA commented Aug 12, 2016

Test build #63686 has finished for PR 14616 at commit f800463.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan]
    • case class UnresolvedOrdinal(ordinal: Int)

@SparkQA
Copy link

SparkQA commented Aug 12, 2016

Test build #63687 has finished for PR 14616 at commit 47f6534.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan]
    • case class UnresolvedOrdinal(ordinal: Int)

import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, _}
import org.apache.spark.sql.catalyst.rules._
import org.apache.spark.sql.catalyst.trees.TreeNodeRef
import org.apache.spark.sql.catalyst.trees.{CurrentOrigin, Origin, TreeNodeRef}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to import here?

// Tests order by ordinal, apply single rule.
val plan = testRelation2.orderBy(Literal(1).asc, Literal(2).asc)
comparePlans(
new UnresolvedOrdinalSubstitution(conf).apply(plan),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we don't have unit test for single analyzer rule yet, this suite is for the whole analyzer. We should write test like

val query = testRelation2.orderBy(Literal(1).asc, Literal(2).asc)
val expected = testRelation2.orderBy('a.asc, 'b.asc)
checkAnalysis(query, expected)

Or we can create a new suite and begin to test single analyzer rule

Copy link
Contributor Author

@clockfly clockfly Aug 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a new test suite.

@SparkQA
Copy link

SparkQA commented Aug 16, 2016

Test build #63821 has finished for PR 14616 at commit db84e25.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 7b65030 Aug 16, 2016
ColZer pushed a commit to ColZer/spark that referenced this pull request Aug 16, 2016
… ordinals in GROUP BY or ORDER BY

## What changes were proposed in this pull request?

This PR adds expression `UnresolvedOrdinal` to represent the ordinal in GROUP BY or ORDER BY, and fixes the rules when resolving ordinals.

Ordinals in GROUP BY or ORDER BY like `1` in `order by 1` or `group by 1` should be considered as unresolved before analysis. But in current code, it uses `Literal` expression to store the ordinal. This is inappropriate as `Literal` itself is a resolved expression, it gives the user a wrong message that the ordinals has already been resolved.

### Before this change

Ordinal is stored as `Literal` expression

```
scala> sc.setLogLevel("TRACE")
scala> sql("select a from t group by 1 order by 1")
...
'Sort [1 ASC], true
 +- 'Aggregate [1], ['a]
     +- 'UnresolvedRelation `t
```

For query:

```
scala> Seq(1).toDF("a").createOrReplaceTempView("t")
scala> sql("select count(a), a from t group by 2 having a > 0").show
```

During analysis, the intermediate plan before applying rule `ResolveAggregateFunctions` is:

```
'Filter ('a > 0)
   +- Aggregate [2], [count(1) AS count(1)#83L, a#81]
        +- LocalRelation [value#7 AS a#9]
```

Before this PR, rule `ResolveAggregateFunctions` believes all expressions of `Aggregate` have already been resolved, and tries to resolve the expressions in `Filter` directly. But this is wrong, as ordinal `2` in Aggregate is not really resolved!

### After this change

Ordinals are stored as `UnresolvedOrdinal`.

```
scala> sc.setLogLevel("TRACE")
scala> sql("select a from t group by 1 order by 1")
...
'Sort [unresolvedordinal(1) ASC], true
 +- 'Aggregate [unresolvedordinal(1)], ['a]
      +- 'UnresolvedRelation `t`
```

## How was this patch tested?

Unit tests.

Author: Sean Zhong <seanzhong@databricks.com>

Closes apache#14616 from clockfly/spark-16955.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants