Skip to content

Conversation

@cloud-fan
Copy link
Contributor

Also alias the ExtractValue instead of wrapping it with UnresolvedAlias when resolve attribute in LogicalPlan, as this alias will be trimmed if it's unnecessary.

This also fixes https://issues.apache.org/jira/browse/SPARK-9323 as DataFrame.resolve won't return unresolved expression now.

@cloud-fan
Copy link
Contributor Author

cc @JoshRosen , this anwsers your last question at https://issues.apache.org/jira/browse/SPARK-9323
:)

@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #39852 has finished for PR 7957 at commit ad81e92.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

Seems master is broken?

@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #1353 has finished for PR 7957 at commit ad81e92.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class BinaryPrefixComparator extends PrefixComparator
    • public static final class BinaryPrefixComparatorDesc extends PrefixComparator
    • case class SequenceNumberRanges(ranges: Seq[SequenceNumberRange])

@cloud-fan
Copy link
Contributor Author

retest this please.

@marmbrus
Copy link
Contributor

marmbrus commented Aug 5, 2015

I think this works, but is there a reason to not just strip all UnresolvedAliases (and possibly nested Aliases) from the plan like we do with subqueries right after analysis?

@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #39861 has finished for PR 7957 at commit ad81e92.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class BinaryPrefixComparator extends PrefixComparator
    • public static final class BinaryPrefixComparatorDesc extends PrefixComparator
    • case class SequenceNumberRanges(ranges: Seq[SequenceNumberRange])
    • case class ArrayContains(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #232 has finished for PR 7957 at commit ad81e92.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class BinaryPrefixComparator extends PrefixComparator
    • public static final class BinaryPrefixComparatorDesc extends PrefixComparator
    • case class SequenceNumberRanges(ranges: Seq[SequenceNumberRange])

@cloud-fan
Copy link
Contributor Author

Hi @marmbrus , we strip UnresolvedAlias from aggregation and projection at Analyzer, which works well for SQL. It will be good to make this rule more general but I'm afaid that will be hard to write. So I think fixing the different behavior between DataFrame and SQL is also reasonable, i.e. making DataFrame.resolve return resolved expressions.

@marmbrus
Copy link
Contributor

marmbrus commented Aug 6, 2015

Why not this after analysis?

plan transformAllExpressions {
  case UnresolvedAlias(child) => child
  case Alias(child, name) => Alias(child transform { case Alias(c, _) => c }, name)
}

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40120 has finished for PR 7957 at commit 925c87b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this duplicated with ResolveAliases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I made a mistake here, we should just remove the UnresolvedAlias, just like what we did in Analyzer.ResolveReferences(we call trimUnresolvedAlias there).

The key problem is we have 2 code path to resolve UnresolvedAttribute, one is Analyzer.ResolveReferences(last case) and the other is DataFrame.resolve, so I think we need to make them consistent, i.e. call trimUnresolvedAlias at DataFrame.resolve or abstract this logic for sql and dataframe.

@marmbrus
Copy link
Contributor

marmbrus commented Aug 7, 2015

Thanks for continuing to work on this :)

@cloud-fan cloud-fan changed the title [SPARK-9634][SPARK-9323][SQL] resolve UnresolvedAlias in DataFrame.resolve [SPARK-9634][SPARK-9323][SQL] cleanup unnecessary Aliases in LogicalPlan at the end of analysis Aug 11, 2015
@cloud-fan
Copy link
Contributor Author

cc @marmbrus , I think this solution is better :)

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40480 has finished for PR 7957 at commit 290ff75.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MQTTUtils(object):
    • case class SortMergeOuterJoin(

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40487 has finished for PR 7957 at commit 27e2435.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MQTTUtils(object):
    • case class SortMergeOuterJoin(

@cloud-fan
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Aug 12, 2015

Test build #40581 has finished for PR 7957 at commit 4cd4fad.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 12, 2015

Test build #1483 has finished for PR 7957 at commit 4cd4fad.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Aug 12, 2015

Test build #40634 has finished for PR 7957 at commit 4cd4fad.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test CreateStructUnsafe too?

@marmbrus
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Aug 12, 2015

Test build #1496 has finished for PR 7957 at commit 4cd4fad.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @rxin , in MLlib sometimes we need to set metadata for the new column, thus we will alias the new column with metadata before call withColumn and in withColumn we alias this clolumn again. Here I added a new parameter to allow user set metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do this in a different PR? Also we should do it without using Option and default arguments so that it works well in Java.

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40726 has finished for PR 7957 at commit 36c5b8b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40761 has finished for PR 7957 at commit 4aa631d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40764 has finished for PR 7957 at commit 4aa631d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove everything except for this and the related tests? I'd like to pull this into the release branch without new features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened #8159 to improve the withColumn, but left the code here to see if we can pass the tests.

This PR did 2 things:

  1. use Alias instead of UnresolvedAlias when resolve nested column in LogicalPlan.resolve
  2. clean unnecessary aliases at the end of analysis

If we only do 1, some tests will fail as we need to trim aliases in the middle of getField chain. If we only do 2, it can't fix any bugs. So I put them together here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've open #8215 which is basically your patch without the mllib changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this release, but this makes me think that we are abusing aliases. I would rather that resolved expressions past the analyzer move the names out of the subexpressions and into the CreateStruct expression itself.

@cloud-fan
Copy link
Contributor Author

closing in favor of #8215

@cloud-fan cloud-fan closed this Aug 15, 2015
asfgit pushed a commit that referenced this pull request Aug 15, 2015
…lPlan at the end of analysis

Also alias the ExtractValue instead of wrapping it with UnresolvedAlias when resolve attribute in LogicalPlan, as this alias will be trimmed if it's unnecessary.

Based on #7957 without the changes to mllib, but instead maintaining earlier behavior when using `withColumn` on expressions that already have metadata.

Author: Wenchen Fan <cloud0fan@outlook.com>
Author: Michael Armbrust <michael@databricks.com>

Closes #8215 from marmbrus/pr/7957.

(cherry picked from commit ec29f20)
Signed-off-by: Reynold Xin <rxin@databricks.com>
asfgit pushed a commit that referenced this pull request Aug 15, 2015
…lPlan at the end of analysis

Also alias the ExtractValue instead of wrapping it with UnresolvedAlias when resolve attribute in LogicalPlan, as this alias will be trimmed if it's unnecessary.

Based on #7957 without the changes to mllib, but instead maintaining earlier behavior when using `withColumn` on expressions that already have metadata.

Author: Wenchen Fan <cloud0fan@outlook.com>
Author: Michael Armbrust <michael@databricks.com>

Closes #8215 from marmbrus/pr/7957.
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
…lPlan at the end of analysis

Also alias the ExtractValue instead of wrapping it with UnresolvedAlias when resolve attribute in LogicalPlan, as this alias will be trimmed if it's unnecessary.

Based on apache#7957 without the changes to mllib, but instead maintaining earlier behavior when using `withColumn` on expressions that already have metadata.

Author: Wenchen Fan <cloud0fan@outlook.com>
Author: Michael Armbrust <michael@databricks.com>

Closes apache#8215 from marmbrus/pr/7957.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants