Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Sep 14, 2015

The output of Generate should not be resolved as Reference.

@SparkQA
Copy link

SparkQA commented Sep 14, 2015

Test build #42442 has finished for PR 8755 at commit dd11d2c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this for transformExpressionsUp, transformAllExpressions too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, ResolveReferences only uses transformExpressionsUp, also Generate is the only special case that need it. So it's enough for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be safe, I will move this change into Generate.

@cloud-fan
Copy link
Contributor

Hi @davies , I think the idea of only transform expressions that defined in expressions is not good enough. The key problem I see is we are using Attribute to represent the output name and type. However, the type info is already in generator.elementTypes and we only need names, i.e. use Seq[String] instead of Seq[Arrtibute] for generatorOutput.

What do you think?

@SparkQA
Copy link

SparkQA commented Sep 15, 2015

Test build #42495 has finished for PR 8755 at commit 887474e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicating this code is kind of scary to me. I think I like the other fix better. Though ideally, I think that we would just fix the Analyzer to understand that generate is special.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I switched to in way is to isolate the change, because expressions.contains(e) could be expensive for wider tables (there are many columns). Should we improve the previous one (using lazy val and Set)?

@SparkQA
Copy link

SparkQA commented Sep 15, 2015

Test build #1757 has finished for PR 8755 at commit 887474e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 16, 2015

Test build #1764 has finished for PR 8755 at commit 887474e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Sep 18, 2015

@cloud-fan I tried to change the output to be Seq[String] but failed, because after resolve Generate, we still need to a Generate with Attribute.

@yhuai
Copy link
Contributor

yhuai commented Sep 18, 2015

generator.elementTypes is used to stored the data types. We can create new attributes based on the name and data types. Actually, can we fix it by changing ResolveReferences (having a case to handle generate)?

@davies
Copy link
Contributor Author

davies commented Sep 21, 2015

@yhuai @cloud-fan I had changed to use a special case for Generate in ResolveReferrence, please take another look, thanks!

@SparkQA
Copy link

SparkQA commented Sep 21, 2015

Test build #42772 has finished for PR 8755 at commit 6b02a41.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 22, 2015

Test build #1782 has finished for PR 8755 at commit 6b02a41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

This is a simple fix and it works. However, it still seems hacky to me and I'm thinking about introducing an UnresolvedGenerate. That will require more code changes and I'm not sure if we should do it. Maybe we should merge this one first.

@marmbrus
Copy link
Contributor

I think this seems fine for now. Generate has always been a weird case since its one of the few places new attributes just appear in non-leaf nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be good to say that ResolveGenerate will create AttributeReferences for output. I will fix it when merging it.

@yhuai
Copy link
Contributor

yhuai commented Sep 22, 2015

LGTM. Merging to branch 1.5 and master.

asfgit pushed a commit that referenced this pull request Sep 22, 2015
The output of Generate should not be resolved as Reference.

Author: Davies Liu <davies@databricks.com>

Closes #8755 from davies/view.

(cherry picked from commit 22d4015)
Signed-off-by: Yin Huai <yhuai@databricks.com>
@asfgit asfgit closed this in 22d4015 Sep 22, 2015
ashangit pushed a commit to ashangit/spark that referenced this pull request Oct 19, 2016
The output of Generate should not be resolved as Reference.

Author: Davies Liu <davies@databricks.com>

Closes apache#8755 from davies/view.

(cherry picked from commit 22d4015)
Signed-off-by: Yin Huai <yhuai@databricks.com>
(cherry picked from commit c3112a9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants