Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Jul 20, 2023

What changes were proposed in this pull request?

Make rule ExtractWindowExpressions retain the PLAN_ID_TAG

Why are the changes needed?

In #39925, we introduced a new mechanism to resolve expression with specified plan.

However, sometimes the plan ID might be discarded by some analyzer rules, and then some expressions can not be correctly resolved, this issue is the main blocker of PS on Connect.

Does this PR introduce any user-facing change?

yes, a lot of Pandas APIs enabled

How was this patch tested?

Enable UTs

@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented Jul 20, 2023

I have checked with @cloud-fan that we might have to modify the rules one by one.
The good news is that 90% UTs can be resolved by this single one, and I think we only need to touch 3~4 more rules.

@HyukjinKwon @itholic

@itholic
Copy link
Contributor

itholic commented Jul 20, 2023

The good news is that 90% UTs can be resolved by this single one, and I think we only need to touch 3~4 more rules.

Great! Could you help creating tickets for remaining 3~4 more rules under SPARK-42497??

@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented Jul 20, 2023

The good news is that 90% UTs can be resolved by this single one, and I think we only need to touch 3~4 more rules.

Great! Could you help creating tickets for remaining 3~4 more rules under SPARK-42497??

TBH, I am not very sure about which rules to modify for the remaining UTs, it needs further investigation

@itholic
Copy link
Contributor

itholic commented Jul 20, 2023

I got it. Just created single ticket here: SPARK-44492 for addressing undefined remaining tests so we don't miss it.

@HyukjinKwon
Copy link
Member

I am fine with this as a workaround for now but such implementation depending on tags is sort of flaky. The tags are easily lost when you, e.g., copy the expressions IIRC.

@zhengruifeng
Copy link
Contributor Author

I saw all tests in pyspark-pandas-slow-connect successed

image

not sure why it suddenly became
image

let me disable a few UT to see what happens

@zhengruifeng zhengruifeng force-pushed the ps_connect_analyze_window branch 3 times, most recently from 38bc650 to f3efddf Compare July 20, 2023 13:23
@github-actions github-actions bot added the BUILD label Jul 20, 2023
@zhengruifeng zhengruifeng force-pushed the ps_connect_analyze_window branch from 666b2c1 to f3efddf Compare July 20, 2023 15:33
@github-actions github-actions bot removed the BUILD label Jul 20, 2023
@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented Jul 20, 2023

@zhengruifeng zhengruifeng force-pushed the ps_connect_analyze_window branch 2 times, most recently from 2ec3f81 to e22e1d9 Compare July 20, 2023 23:31
Comment on lines 156 to 157
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
other.getTagValue(tag)
.foreach(this.setTagValue(tag, _))
other.getTagValue(tag).foreach(this.setTagValue(tag, _))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we copy all tags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, let me have a try

@zhengruifeng zhengruifeng force-pushed the ps_connect_analyze_window branch from e22e1d9 to ff6d479 Compare July 25, 2023 00:23
@zhengruifeng zhengruifeng changed the title [SPARK-43611][SQL][PS][CONNCECT] Make ExtractWindowExpressions retain the PLAN_ID_TAG [SPARK-43611][SQL][PS][CONNCECT] Make ExtractWindowExpressions retain the PLAN_ID_TAG Jul 25, 2023
@zhengruifeng zhengruifeng force-pushed the ps_connect_analyze_window branch from ff6d479 to fc488da Compare July 26, 2023 08:03
@zhengruifeng
Copy link
Contributor Author

cc @cloud-fan all tests passed, would you mind taking another look?

@zhengruifeng
Copy link
Contributor Author

thanks! merged to master

@zhengruifeng zhengruifeng deleted the ps_connect_analyze_window branch July 27, 2023 03:01
@itholic
Copy link
Contributor

itholic commented Jul 27, 2023

Thanks!

zhengruifeng added a commit that referenced this pull request Jul 27, 2023
### What changes were proposed in this pull request?
Enable more tests, they were excluded from #42086 due to the flaky CI issues

### Why are the changes needed?
for test parity

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
enabled tests

Closes #42182 from zhengruifeng/spark_43611_followup.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
@HyukjinKwon
Copy link
Member

Should we merge this to 3.5 too?

@zhengruifeng
Copy link
Contributor Author

@HyukjinKwon not sure, maybe not needed?

itholic pushed a commit to itholic/spark that referenced this pull request Aug 1, 2023
…in the `PLAN_ID_TAG`

### What changes were proposed in this pull request?
Make rule `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

### Why are the changes needed?
In apache#39925, we introduced a new mechanism to resolve expression with specified plan.

However, sometimes the plan ID might be discarded by some analyzer rules, and then some expressions can not be correctly resolved, this issue is the main blocker of PS on Connect.

### Does this PR introduce _any_ user-facing change?
yes, a lot of Pandas APIs enabled

### How was this patch tested?
Enable UTs

Closes apache#42086 from zhengruifeng/ps_connect_analyze_window.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
zhengruifeng added a commit that referenced this pull request Aug 1, 2023
…wExpressions` & `WidenSetOperationTypes` retain the `PLAN_ID_TAG`

### What changes were proposed in this pull request?

Backport for #42086 and #42230

### Why are the changes needed?

for functionality parity

### Does this PR introduce _any_ user-facing change?

Enabling couple of pandas APIs

### How was this patch tested?

Enabling the existing UTs

Closes #42252 from itholic/SPARK-43611-3.5.

Lead-authored-by: Ruifeng Zheng <ruifengz@apache.org>
Co-authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants