Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This is a followup of #28695 , to fix the problem completely.

The root cause is that, df("col").as("name") is not a column reference anymore, and should not have the special column metadata. However, this was broken in ba7adc4#diff-ac415c903887e49486ba542a65eec980L1050-L1053

This PR fixes the regression, by strip the special column metadata in Column.name, which is the behavior before #28326 .

Why are the changes needed?

Fix a regression. We shouldn't fail if there is no ambiguous self-join.

Does this PR introduce any user-facing change?

Yes, the query in the test can run now.

How was this patch tested?

updated test

@cloud-fan
Copy link
Contributor Author

cc @HyukjinKwon @xuanyuanking

@HyukjinKwon HyukjinKwon changed the title [SPARK-28344][SQL] don't fail if there is no ambiguous self join [SPARK-28344][SQL][FOLLOW-UP] Do not fail if there is no ambiguous self join Jun 10, 2020
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HyukjinKwon
Copy link
Member

Shall we file a new JIRA instead of using SPARK-28344 since RC3 will likely pass and the fixed version conflicts.

Copy link
Member

@xuanyuanking xuanyuanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cloud-fan cloud-fan changed the title [SPARK-28344][SQL][FOLLOW-UP] Do not fail if there is no ambiguous self join [SPARK-31956][SQL] Do not fail if there is no ambiguous self join Jun 10, 2020
@SparkQA
Copy link

SparkQA commented Jun 10, 2020

Test build #123761 has finished for PR 28783 at commit 38c0508.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you for fixing this, @cloud-fan !

@dongjoon-hyun
Copy link
Member

Retest this please.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. I verified Python UT locally.
Merged to master/3.0

dongjoon-hyun pushed a commit that referenced this pull request Jun 10, 2020
### What changes were proposed in this pull request?

This is a followup of #28695 , to fix the problem completely.

The root cause is that, `df("col").as("name")` is not a column reference anymore, and should not have the special column metadata. However, this was broken in ba7adc4#diff-ac415c903887e49486ba542a65eec980L1050-L1053

This PR fixes the regression, by strip the special column metadata in `Column.name`, which is the behavior before #28326 .

### Why are the changes needed?

Fix a regression. We shouldn't fail if there is no ambiguous self-join.

### Does this PR introduce _any_ user-facing change?

Yes, the query in the test can run now.

### How was this patch tested?

updated test

Closes #28783 from cloud-fan/self-join.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit c400519)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Test build #123778 has finished for PR 28783 at commit 38c0508.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

holdenk pushed a commit to holdenk/spark that referenced this pull request Jun 25, 2020
### What changes were proposed in this pull request?

This is a followup of apache#28695 , to fix the problem completely.

The root cause is that, `df("col").as("name")` is not a column reference anymore, and should not have the special column metadata. However, this was broken in apache@ba7adc4#diff-ac415c903887e49486ba542a65eec980L1050-L1053

This PR fixes the regression, by strip the special column metadata in `Column.name`, which is the behavior before apache#28326 .

### Why are the changes needed?

Fix a regression. We shouldn't fail if there is no ambiguous self-join.

### Does this PR introduce _any_ user-facing change?

Yes, the query in the test can run now.

### How was this patch tested?

updated test

Closes apache#28783 from cloud-fan/self-join.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit c400519)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
rshkv pushed a commit to palantir/spark that referenced this pull request Jan 28, 2021
This is a followup of apache#28695 , to fix the problem completely.

The root cause is that, `df("col").as("name")` is not a column reference anymore, and should not have the special column metadata. However, this was broken in apache@ba7adc4#diff-ac415c903887e49486ba542a65eec980L1050-L1053

This PR fixes the regression, by strip the special column metadata in `Column.name`, which is the behavior before apache#28326 .

Fix a regression. We shouldn't fail if there is no ambiguous self-join.

Yes, the query in the test can run now.

updated test

Closes apache#28783 from cloud-fan/self-join.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants