Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

The PR #10541 changed the rule CollapseProject by enabling collapsing Project into Aggregate. It leaves a to-do item to remove the duplicate code. This PR is to finish this to-do item. Also added a test case for covering this change.

How was this patch tested?

Added a new test case.

@liancheng Could you check if the code refactoring is fine? Thanks!

@SparkQA
Copy link

SparkQA commented Feb 29, 2016

Test build #52177 has finished for PR 11427 at commit d132c58.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@frreiss
Copy link
Contributor

frreiss commented Mar 8, 2016

LGTM

projectList1.exists(_.collect {
case a: Attribute if aliasMap.contains(a) => aliasMap(a).child
}.exists(!_.deterministic))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further refactored your code a little bit:

def collectAliases(projectList: Seq[NamedExpression]): AttributeMap[Alias] = {
  AttributeMap(projectList.collect {
    case a: Alias => a.toAttribute -> a
  })
}

def haveCommonNonDeterministicOutput(
    upper: Seq[NamedExpression], lower: Seq[NamedExpression]): Boolean = {
  val aliases = collectAliases(lower)
  upper.exists(_.collect {
    case a: Attribute if aliases.contains(a) => aliases(a).child
  }).exists(!_.deterministic)
}

def buildCleanedProjectList(
    upper: Seq[NamedExpression],
    lower: Seq[NamedExpression]): Seq[NamedExpression] = {
  val aliases = collectAliases(lower)

  val rewrittenUpper = upper.map(_.transform {
    case a: Attribute => aliases.getOrElse(a, a)
  })

  rewrittenUpper.map { p =>
    CleanupAliases.trimNonTopLevelAliases(p).asInstanceOf[NamedExpression]
  }
}

And those inline comments need some rewording as they are now moved to different contexts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! will do

@liancheng
Copy link
Contributor

Sorry for the late review. In general it looks good. Left some comments for better readability and testing.

@gatorsmile
Copy link
Member Author

This is definitely a low priority issue. Thank you for your review! Will do the change based on your suggestions. @liancheng

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53934 has finished for PR 11427 at commit a44b962.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

Thanks! Merging to master.

@asfgit asfgit closed this in 6ce008b Mar 23, 2016
@gatorsmile
Copy link
Member Author

Thank you! @liancheng

Now, this rule supports collapsing two continuous Project operators and one upper Project and one lower Aggregate.

After rethinking it, maybe we also can collapse one upper Aggregate into one lower Project. Will try to submit a PR for this. : )

@gatorsmile gatorsmile deleted the collapseProjectRefactor branch March 23, 2016 19:42
@gatorsmile
Copy link
Member Author

Let me post an example here:

SELECT sum(b) FROM parquet_t2 group by a, b

The current optimized plan is like

Aggregate [a#43L,b#44L], [(sum(b#44L),mode=Complete,isDistinct=false) AS sum(b)#51L]
+- Project [a#43L,b#44L]
   +- Relation[a#43L,b#44L,c#45L,d#46L] ParquetFormat part: struct<>, data: struct<a:bigint,b:bigint,c:bigint,d:bigint>

Ideally, we can directly remove the Project.

Aggregate [a#43L,b#44L], [(sum(b#44L),mode=Complete,isDistinct=false) AS sum(b)#51L]
 +- Relation[a#43L,b#44L,c#45L,d#46L] ParquetFormat part: struct<>, data: struct<a:bigint,b:bigint,c:bigint,d:bigint>

@gatorsmile
Copy link
Member Author

But, their physical plan is still the same. I guess, we do not need to do it. : )

@liancheng
Copy link
Contributor

@gatorsmile Yea, whole-stage codegen makes this kind of optimization not that appealing.

@gatorsmile
Copy link
Member Author

I see. Thanks! @liancheng Let me find something else in Optimizer. Will do expression folding this weekend. : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants