[SPARK-53615][FOLLOWUP][PYTHON][TEST] Fix test case for arrow grouped agg UDF partial consumption #53372

Yicong-Huang · 2025-12-07T06:07:41Z

What changes were proposed in this pull request?

Fix test case test_iterator_grouped_agg_partial_consumption to use count and sum instead of mean for testing partial consumption. Use the same value for all data points to avoid ordering issues.

Why are the changes needed?

Fixes test flakiness by ensuring test data points have the same value and using count/sum metrics that properly validate partial consumption behavior, making the test robust against ordering variations.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Ran existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

gaogaotiantian · 2025-12-07T18:38:25Z

Did you get a test failure before fixing this? What is the ordering issue? Is the partition random or the order of each item in a particular group?

I'm not sure how spark is supposed to work so I don't know the exact expected behavior, but changing the dataset to the same actually "fixes" the test by "igoring" it. If there is an order the engine should stick to, we actually missed the coverage there - we can't catch it if there is a regression. So could you explain a bit more about why the test is flaky?

HyukjinKwon · 2025-12-07T22:38:36Z

There can be test failures due to diff number of partitions as an example (e.g., using not the default conf, or indeterministic for another reason like hash). I assume that this is one of the cases.

HyukjinKwon · 2025-12-07T22:38:48Z

Merged to master.

… agg UDF partial consumption ### What changes were proposed in this pull request? Fix test case `test_iterator_grouped_agg_partial_consumption` to use count and sum instead of mean for testing partial consumption. Use the same value for all data points to avoid ordering issues. ### Why are the changes needed? Fixes test flakiness by ensuring test data points have the same value and using count/sum metrics that properly validate partial consumption behavior, making the test robust against ordering variations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#53372 from Yicong-Huang/SPARK-53615/fix/fix-test-partial-consumption-ordering. Authored-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

test: use same data points to avoid potential order issue in test

874efca

github-actions bot added SQL PYTHON labels Dec 7, 2025

HyukjinKwon approved these changes Dec 7, 2025

View reviewed changes

HyukjinKwon closed this in c7338f6 Dec 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53615][FOLLOWUP][PYTHON][TEST] Fix test case for arrow grouped agg UDF partial consumption #53372

[SPARK-53615][FOLLOWUP][PYTHON][TEST] Fix test case for arrow grouped agg UDF partial consumption #53372

Uh oh!

Yicong-Huang commented Dec 7, 2025

Uh oh!

gaogaotiantian commented Dec 7, 2025

Uh oh!

HyukjinKwon commented Dec 7, 2025

Uh oh!

HyukjinKwon commented Dec 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-53615][FOLLOWUP][PYTHON][TEST] Fix test case for arrow grouped agg UDF partial consumption #53372

[SPARK-53615][FOLLOWUP][PYTHON][TEST] Fix test case for arrow grouped agg UDF partial consumption #53372

Uh oh!

Conversation

Yicong-Huang commented Dec 7, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gaogaotiantian commented Dec 7, 2025

Uh oh!

HyukjinKwon commented Dec 7, 2025

Uh oh!

HyukjinKwon commented Dec 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants