Skip to content

Conversation

@gatorsmile
Copy link
Member

This fix is to remove the subquery name in qualifiers after eliminating subquery.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I use NamedExpression to replace AttributeReference?

@dbtsai
Copy link
Member

dbtsai commented Oct 30, 2015

Jenkins, okay to test.

@cloud-fan
Copy link
Contributor

Logically when we remove Subquery, we should remove related qualifiers, however, can you construct a case that hit problems because of not removing qualifiers?

@gatorsmile
Copy link
Member Author

Hi, @cloud-fan and @dbtsai

So far, I just observed this strange ghosting qualifiers values when I read the optimized logical tree, but my query did not trigger any issue.

Based on my understanding, usage of qualifiers is still limited in the current code base. It could be a potential issue when we support more complex SQL syntax/functions. Thus, I submitted this pull request to resolve this issue.

Of course, I will continue to pay attention to this issue in the future.

Thanks,

Xiao Li

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify this to a map and remove the :: Nil we have in the two sub cases? since it seems we are always returning a single element list for every case so it should be ok as a map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I did the change based on your suggestion. : )

@gatorsmile
Copy link
Member Author

@cloud-fan @dbtsai , Jenkins did not start the testing. Could you let Jenkins to test it?

Thank you!

@dbtsai
Copy link
Member

dbtsai commented Nov 4, 2015

I think there is some issue in Jenkins.

@dbtsai
Copy link
Member

dbtsai commented Nov 4, 2015

Jenkins, add to whitelist

@SparkQA
Copy link

SparkQA commented Nov 4, 2015

Test build #44981 has finished for PR 9385 at commit a26763d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@dbtsai Thank you!

Please let me know if you need any extra code change.

@marmbrus
Copy link
Contributor

marmbrus commented Nov 7, 2015

Thanks for your contribution, but I'm tempted to not make this change unless there is actually a bug. We are eliminating the subqueries because they will impact optimization and planning. However, keeping the qualifiers around could actually be useful if we want to give better error messages.

@gatorsmile
Copy link
Member Author

@marmbrus I already hit this issue when resolving https://issues.apache.org/jira/browse/SPARK-8658. That means, when comparing two AttributeReferences, we should not compare their qualifiers. That looks a strange fix, right?

@marmbrus
Copy link
Contributor

marmbrus commented Nov 7, 2015

Expression instances have two different kinds of equality:

  • equals full equality, i.e. capitalization, qualifiers, etc. This is important for example to know if a tree transformation has reached fixed point.
  • semanticEquals equivalent in execution (i.e. will return the same result)

If you ran into problems adding qualifiers to equals then I think it is likely that that code should actually be using semanticEquals.

@gatorsmile
Copy link
Member Author

@marmbrus Thanks!

I will try to change equals to semanticEquals in the pull request #9216. Then, you can decide if this is a right solution.

@gatorsmile gatorsmile closed this Nov 10, 2015
@gatorsmile gatorsmile reopened this Nov 10, 2015
@gatorsmile
Copy link
Member Author

Hi, @marmbrus

After digging the root reason why Expand cases failed, I found we still need a deeper clean of subquery names in the plan tree after elimination.

Let me use the following example to explain what happened in Expand. This query works well if we do not compare the qualifiers when comparing two AttributeReferences. However, if merging #9216, our current subquery elimination will cause an incorrect query result.

val sqlDF = sql("select a, b, sum(a) from mytable group by a, b with rollup").explain(true)

Before subquery elimination, the subquery name "mytable" is shown in all the two upper layers (Aggregate and Expand).

Aggregate [a#2,b#3,grouping__id#5], [a#2,b#3,sum(cast(a#2 as bigint)) AS _c2#4L]
 Expand [0,1,3], [a#2,b#3], grouping__id#5
  Subquery mytable
   Project [_1#0 AS a#2,_2#1 AS b#3]
    LocalRelation [_1#0,_2#1], [[1,2],[2,4]]

After subquery elimination, the subquery name "mytable" is not removed in these two upper layers Aggregate and Expand.

Aggregate [a#2,b#3,grouping__id#5], [a#2,b#3,sum(cast(a#2 as bigint)) AS _c2#4L]
 Expand [0,1,3], [a#2,b#3], grouping__id#5
  Project [_1#0 AS a#2,_2#1 AS b#3]
   LocalRelation [_1#0,_2#1], [[1,2],[2,4]]

In SparkStrategies, we create an array of Projections for the child projection of Expand.

      case e @ logical.Expand(_, _, _, child) =>
        execution.Expand(e.projections, e.output, planLater(child)) :: Nil

e.projections calls the function expand(). Inside the function expand(), I do not think we should use semanticEquals there.

Let me post the incorrect physical plan

TungstenAggregate(key=[a#2,b#3,grouping__id#12], functions=[(sum(cast(a#2 as bigint)),mode=Final,isDistinct=false)], output=[a#2,b#3,_c2#11L])
 TungstenExchange hashpartitioning(a#2,b#3,grouping__id#12,5)
  TungstenAggregate(key=[a#2,b#3,grouping__id#12], functions=[(sum(cast(a#2 as bigint)),mode=Partial,isDistinct=false)], output=[a#2,b#3,grouping__id#12,currentSum#15L])
   Expand [List(a#2, b#3, 0),List(a#2, b#3, 1),List(a#2, b#3, 3)], [a#2,b#3,grouping__id#12]
    LocalTableScan [a#2,b#3], [[1,2],[2,4]]

For you convenience, below is the correct one (if we do not compare qualifiers in the equal method of AttributeReference):

TungstenAggregate(key=[a#2,b#3,grouping__id#12], functions=[(sum(cast(a#2 as bigint)),mode=Final,isDistinct=false)], output=[a#2,b#3,_c2#11L])
 TungstenExchange hashpartitioning(a#2,b#3,grouping__id#12,5)
  TungstenAggregate(key=[a#2,b#3,grouping__id#12], functions=[(sum(cast(a#2 as bigint)),mode=Partial,isDistinct=false)], output=[a#2,b#3,grouping__id#12,currentSum#15L])
   Expand [List(null, null, 0),List(a#2, null, 1),List(a#2, b#3, 3)], [a#2,b#3,grouping__id#12]
    LocalTableScan [a#2,b#3], [[1,2],[2,4]]

My current fix does not fix this issue yet.

@marmbrus
Copy link
Contributor

I'm sorry, I don't see where this expand() function you are talking about is, or why it should not use semanticEquals. The whole point of semanticEquals is "are these two attributes referring to the same data", and in many cases this is actually what you want instead of structural equality (structural equality matters, for example, when you are checking to see if a tree has reached fixed point. otherwise we could not use rules to make changes to Attributes that are cosmetic in nature).

@gatorsmile
Copy link
Member Author

Hi, @marmbrus

Originally, I thought quantifiers are part of identifiers, like schema name in traditional RDBMS. Based on your explanation, this is not true.

I did a code change. Please check if the latest changes make sense. semanticEquals is used. Now, all the test cases passed. gatorsmile@8e72b17

Just did a merge to the latest master. #9216. Unfortunately, it triggered another failure in CachedTableSuite. Will try to see if we can use semanticEquals too.

Thank you for your time.

@gatorsmile
Copy link
Member Author

@marmbrus CachedTableSuite failed due to the same reason. We did not clean up the subquery names. Thus, it is unable to give a correct result when deciding if Exchange is needed.

I did the fix by using semanticEquals. Please check if the changes are appropriate. #9216

Now, all the test cases passed. Thanks.

@marmbrus
Copy link
Contributor

Can we close this now that #9216 is merged?

@gatorsmile
Copy link
Member Author

Sure. Close it. Thank you for your time!

@gatorsmile gatorsmile closed this Nov 17, 2015
@gatorsmile gatorsmile deleted the eliminateSubQ branch November 17, 2015 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants