[SPARK-10777] [SQL] Resolve Aliases in the Group By clause #10967

kevinyu98 · 2016-01-28T07:50:36Z

@gatorsmile @yhuai @marmbrus @cloud-fan : Hello All, I tried to run the failing query with PR 10678 from Spark-12705, still got the same failure.

Actually for this jira problem, I can recreate it without using order by and window function. It just needs select a column with aliases and aggregate function , group by with the aliases.

the query looks like below:

select a r, sum(b) s FROM testData2 GROUP BY r

(if I replace r in the group by with a, it will work)

I think this jira is different than Xiao's jira.

For this Jira, it looks like the Aliases in the Group By clause (r) can't be resolved in the rule ResolveReferences.

Currently, the ResolveReferences only deal with the aggregate function if the argument contains Stars, so for other aggregate function, it falls into this case: case q: LogicalPlan , and it will try to resolve it in the child. In this case, the group by contains alias r, the child is LogicalRDD contains column a and b, that is why we can't find r in the child.

Here is the plan looks like.

plan = {Aggregate@9173} "'Aggregate ['r], [a#4 AS r#43,(sum(cast(b#5 as bigint)),mode=Complete,isDistinct=false) AS s#44L]\n+- Subquery testData2\n +- LogicalRDD [a#4,b#5], MapPartitionsRDD[5] at beforeAll at BeforeAndAfterAll.scala:187\n"
groupingExpressions = {$colon$colon@9176} "::" size = 1
(0) = {UnresolvedAttribute@9190} "'r"
aggregateExpressions = {$colon$colon@9177} "::" size = 2
(0) = {Alias@9110} "a#4 AS r#43"
(1) = {Alias@9196} "(sum(cast(b#5 as bigint)),mode=Complete,isDistinct=false) AS s#44L"
child = {Subquery@7456} "Subquery testData2\n+- LogicalRDD [a#4,b#5], MapPartitionsRDD[5] at beforeAll at BeforeAndAfterAll.scala:187\n"
alias = {String@9201} "testData2"
child = {LogicalRDD@9202} "LogicalRDD [a#4,b#5], MapPartitionsRDD[5] at beforeAll at BeforeAndAfterAll.scala:187\n"
analyzed = false
resolved = true
cleanArgs = null
org$apache$spark$Logging$$log = null
bitmap$0 = 1
schema = null
bitmap$0 = false
origin = {Origin@9203} "Origin(Some(1),Some(27))"
containsChild = {Set$Set1@9204} "Set$Set1" size = 1
bitmap$0 = true
resolved = false
bitmap$0 = true
_analyzed = false
resolved = false

the proposal fix is that we create another case for aggregate function, if there is unresolved attribute in the groupingExpressions, and all the attributes are resolved in the aggregateExpressions, we will search the unresolved attribute in the aggregateExpressions first.

Thanks for reviewing.

AmplabJenkins · 2016-01-28T07:52:12Z

Can one of the admins verify this patch?

gatorsmile · 2016-01-28T15:36:26Z

This is a separate issue. It happens when the alias defined in aggregation expression is used in the group by. Thus, you do not need to merge my fix, which is still being changed for addressing the comments. Thank you!

marmbrus · 2016-01-28T21:56:48Z

I'm not sure we want this. Neither oracle nor SQL server support it and you can already use numbers to refer to things from the select clause in a group by.

resolve the UnresolvedAttribute for aliases in GROUP By clause

6fae959

kevinyu98 force-pushed the working_on_spark-10777 branch 2 times, most recently from 6564c36 to 6fae959 Compare January 28, 2016 18:01

kevinyu98 added 2 commits January 28, 2016 13:47

move testcase to SQLQuerySuite.scala

da86a01

remove the change in JoinSuite.scala

6ad40c7

adding testcase

b0157d4

srowen mentioned this pull request May 11, 2016

[BUILD] Test closing stale PRs #13052

Closed

asfgit closed this in 5bb62b8 May 12, 2016

dilipbiswal mentioned this pull request Mar 15, 2017

[SPARK-14471][SQL] Aliases in SELECT could be used in GROUP BY #17191

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-10777] [SQL] Resolve Aliases in the Group By clause #10967

[SPARK-10777] [SQL] Resolve Aliases in the Group By clause #10967

Uh oh!

kevinyu98 commented Jan 28, 2016

Uh oh!

AmplabJenkins commented Jan 28, 2016

Uh oh!

gatorsmile commented Jan 28, 2016

Uh oh!

marmbrus commented Jan 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-10777] [SQL] Resolve Aliases in the Group By clause #10967

[SPARK-10777] [SQL] Resolve Aliases in the Group By clause #10967

Uh oh!

Conversation

kevinyu98 commented Jan 28, 2016

Uh oh!

AmplabJenkins commented Jan 28, 2016

Uh oh!

gatorsmile commented Jan 28, 2016

Uh oh!

marmbrus commented Jan 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants