Skip to content

Conversation

@Yicong-Huang
Copy link
Contributor

What changes were proposed in this pull request?

Fix test case test_iterator_grouped_agg_partial_consumption to use count and sum instead of mean for testing partial consumption. Use the same value for all data points to avoid ordering issues.

Why are the changes needed?

Fixes test flakiness by ensuring test data points have the same value and using count/sum metrics that properly validate partial consumption behavior, making the test robust against ordering variations.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Ran existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@gaogaotiantian
Copy link
Contributor

Did you get a test failure before fixing this? What is the ordering issue? Is the partition random or the order of each item in a particular group?

I'm not sure how spark is supposed to work so I don't know the exact expected behavior, but changing the dataset to the same actually "fixes" the test by "igoring" it. If there is an order the engine should stick to, we actually missed the coverage there - we can't catch it if there is a regression. So could you explain a bit more about why the test is flaky?

@HyukjinKwon
Copy link
Member

There can be test failures due to diff number of partitions as an example (e.g., using not the default conf, or indeterministic for another reason like hash). I assume that this is one of the cases.

@HyukjinKwon
Copy link
Member

Merged to master.

xu20160924 pushed a commit to xu20160924/spark that referenced this pull request Dec 9, 2025
… agg UDF partial consumption

### What changes were proposed in this pull request?

Fix test case `test_iterator_grouped_agg_partial_consumption` to use count and sum instead of mean for testing partial consumption. Use the same value for all data points to avoid ordering issues.

### Why are the changes needed?

Fixes test flakiness by ensuring test data points have the same value and using count/sum metrics that properly validate partial consumption behavior, making the test robust against ordering variations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#53372 from Yicong-Huang/SPARK-53615/fix/fix-test-partial-consumption-ordering.

Authored-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants