Skip to content

Conversation

@bveeramani
Copy link
Member

@bveeramani bveeramani commented Nov 17, 2025

Description

#58234 increased the scale of the map_groups release test from SF 10 to SF 100. Since then, the release test has been consistently failing (see #58312).

To avoid a perpetually broken release test, this PR reverts the scale to SF 10 while we investigate and fix the scalability issue.

Related issues

#58312

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani enabled auto-merge (squash) November 17, 2025 21:31
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Nov 17, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly reverts the scale factor for the map_groups release test from SF 100 to SF 10, which is a necessary step to resolve the consistent test failures. My review includes a suggestion to add a TODO comment to track this temporary change, ensuring it's revisited once the underlying scalability issue is fixed. This will improve the long-term maintainability of the test configuration.

timeout: 3600
script: >
python groupby_benchmark.py --sf 100 --map-groups --group-by {{columns}}
python groupby_benchmark.py --sf 10 --map-groups --group-by {{columns}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This change is a good temporary fix to unblock the release tests. To improve long-term maintainability, it would be beneficial to add a TODO comment in the code explaining why the scale factor was reduced and referencing the tracking issue (#58312).

Since this script block uses the folded block scalar (>), adding an inline comment will comment out the rest of the script. You could change the block scalar to literal style (|) on line 195 to add the comment on a separate line.

For example:

  run:
    timeout: 3600
    script: |
      # TODO(#58312): Revert to --sf 100 once the scalability issue is fixed.
      python groupby_benchmark.py --sf 10 --map-groups --group-by {{columns}}
      --shuffle-strategy {{shuffle_strategy}}

This will ensure the reason for this temporary change is not lost over time.

@bveeramani bveeramani merged commit 83a456e into master Nov 17, 2025
5 of 7 checks passed
@bveeramani bveeramani deleted the decrease-map-groups-scale branch November 17, 2025 21:51
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
## Description

ray-project#58234 increased the scale of the
`map_groups` release test from SF 10 to SF 100. Since then, the release
test has been consistently failing (see
ray-project#58312).

To avoid a perpetually broken release test, this PR reverts the scale to
SF 10 while we investigate and fix the scalability issue.

## Related issues

ray-project#58312

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
bveeramani added a commit that referenced this pull request Nov 25, 2025
#58710)

> Thank you for contributing to Ray! 🚀
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> 💡 Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description
> Briefly describe what this PR accomplishes and why it's needed.

#58711 decreased the scale of the
`map_groups` tests from scale-factor 100 to scale-factor 10 because some
of the `map_groups` release tests were failing. However, after more
investigation, I realized that the only variant that doesn't work with
scale-factor 100 is the hash shuffle with autoscaling variant (see
#58734).

This PR re-increases the scale and only disables the cases that fail.

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
## Description

ray-project#58234 increased the scale of the
`map_groups` release test from SF 10 to SF 100. Since then, the release
test has been consistently failing (see
ray-project#58312).

To avoid a perpetually broken release test, this PR reverts the scale to
SF 10 while we investigate and fix the scalability issue.

## Related issues

ray-project#58312

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
## Description

ray-project#58234 increased the scale of the
`map_groups` release test from SF 10 to SF 100. Since then, the release
test has been consistently failing (see
ray-project#58312).

To avoid a perpetually broken release test, this PR reverts the scale to
SF 10 while we investigate and fix the scalability issue.

## Related issues

ray-project#58312

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
ray-project#58710)

> Thank you for contributing to Ray! 🚀
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> 💡 Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description
> Briefly describe what this PR accomplishes and why it's needed.

ray-project#58711 decreased the scale of the
`map_groups` tests from scale-factor 100 to scale-factor 10 because some
of the `map_groups` release tests were failing. However, after more
investigation, I realized that the only variant that doesn't work with
scale-factor 100 is the hash shuffle with autoscaling variant (see
ray-project#58734).

This PR re-increases the scale and only disables the cases that fail.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants