-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17034][SQL] adds expression UnresolvedOrdinal to represent the ordinals in GROUP BY or ORDER BY #14616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #63655 has finished for PR 14616 at commit
|
9e956d5 to
9fc1b47
Compare
|
Ping @cloud-fan |
|
Is there a way to fix this without introducing the new unresolved ordinal classes? If there isn't, can we combine the two into a single UnresolvedOrdinal, and also combine the two analysis rule into a single FindUnresolvedOrdinals rule? also cc @hvanhovell |
|
@rxin Yes, |
|
Can we just add a check in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we end up doing it this way, move this to its own file, and create an invididual test suite.
analyzer file is getting too large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
|
@cloud-fan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a way to move the line position information.
|
The bug described in JIRA is already fixed by #14595 by accident... shall we open a new JIRA as this PR is actually a refactor/cleanup? |
|
Test build #63670 has finished for PR 14616 at commit
|
041184b to
c82adad
Compare
|
Test build #63678 has finished for PR 14616 at commit
|
2412fa9 to
f800463
Compare
|
Test build #63685 has finished for PR 14616 at commit
|
|
Test build #63683 has finished for PR 14616 at commit
|
|
Test build #63684 has finished for PR 14616 at commit
|
|
Test build #63686 has finished for PR 14616 at commit
|
|
Test build #63687 has finished for PR 14616 at commit
|
| import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, _} | ||
| import org.apache.spark.sql.catalyst.rules._ | ||
| import org.apache.spark.sql.catalyst.trees.TreeNodeRef | ||
| import org.apache.spark.sql.catalyst.trees.{CurrentOrigin, Origin, TreeNodeRef} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to import here?
| // Tests order by ordinal, apply single rule. | ||
| val plan = testRelation2.orderBy(Literal(1).asc, Literal(2).asc) | ||
| comparePlans( | ||
| new UnresolvedOrdinalSubstitution(conf).apply(plan), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we don't have unit test for single analyzer rule yet, this suite is for the whole analyzer. We should write test like
val query = testRelation2.orderBy(Literal(1).asc, Literal(2).asc)
val expected = testRelation2.orderBy('a.asc, 'b.asc)
checkAnalysis(query, expected)
Or we can create a new suite and begin to test single analyzer rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create a new test suite.
|
Test build #63821 has finished for PR 14616 at commit
|
|
thanks, merging to master! |
… ordinals in GROUP BY or ORDER BY
## What changes were proposed in this pull request?
This PR adds expression `UnresolvedOrdinal` to represent the ordinal in GROUP BY or ORDER BY, and fixes the rules when resolving ordinals.
Ordinals in GROUP BY or ORDER BY like `1` in `order by 1` or `group by 1` should be considered as unresolved before analysis. But in current code, it uses `Literal` expression to store the ordinal. This is inappropriate as `Literal` itself is a resolved expression, it gives the user a wrong message that the ordinals has already been resolved.
### Before this change
Ordinal is stored as `Literal` expression
```
scala> sc.setLogLevel("TRACE")
scala> sql("select a from t group by 1 order by 1")
...
'Sort [1 ASC], true
+- 'Aggregate [1], ['a]
+- 'UnresolvedRelation `t
```
For query:
```
scala> Seq(1).toDF("a").createOrReplaceTempView("t")
scala> sql("select count(a), a from t group by 2 having a > 0").show
```
During analysis, the intermediate plan before applying rule `ResolveAggregateFunctions` is:
```
'Filter ('a > 0)
+- Aggregate [2], [count(1) AS count(1)#83L, a#81]
+- LocalRelation [value#7 AS a#9]
```
Before this PR, rule `ResolveAggregateFunctions` believes all expressions of `Aggregate` have already been resolved, and tries to resolve the expressions in `Filter` directly. But this is wrong, as ordinal `2` in Aggregate is not really resolved!
### After this change
Ordinals are stored as `UnresolvedOrdinal`.
```
scala> sc.setLogLevel("TRACE")
scala> sql("select a from t group by 1 order by 1")
...
'Sort [unresolvedordinal(1) ASC], true
+- 'Aggregate [unresolvedordinal(1)], ['a]
+- 'UnresolvedRelation `t`
```
## How was this patch tested?
Unit tests.
Author: Sean Zhong <seanzhong@databricks.com>
Closes apache#14616 from clockfly/spark-16955.
What changes were proposed in this pull request?
This PR adds expression
UnresolvedOrdinalto represent the ordinal in GROUP BY or ORDER BY, and fixes the rules when resolving ordinals.Ordinals in GROUP BY or ORDER BY like
1inorder by 1orgroup by 1should be considered as unresolved before analysis. But in current code, it usesLiteralexpression to store the ordinal. This is inappropriate asLiteralitself is a resolved expression, it gives the user a wrong message that the ordinals has already been resolved.Before this change
Ordinal is stored as
LiteralexpressionFor query:
During analysis, the intermediate plan before applying rule
ResolveAggregateFunctionsis:Before this PR, rule
ResolveAggregateFunctionsbelieves all expressions ofAggregatehave already been resolved, and tries to resolve the expressions inFilterdirectly. But this is wrong, as ordinal2in Aggregate is not really resolved!After this change
Ordinals are stored as
UnresolvedOrdinal.How was this patch tested?
Unit tests.