Skip to content

Conversation

@alexeykudinkin
Copy link
Contributor

@alexeykudinkin alexeykudinkin commented Nov 7, 2025

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Currently, finalization is scheduled in batches sequentially -- ie batch of N adjacent partitions is finalized at once (in a sliding window).

This creates a lensing effect since:

  1. Adjacent partitions i and i+1 get scheduled onto adjacent aggregators j and j+i (since membership is determined as j = i % num_aggregators)
  2. Adjacent aggregators have high likelihood of getting scheduled on the same node (due to similarly being scheduled at about the same time in sequence)

To address that this change applies random sampling when choosing next partitions to finalize to make sure partitions are chosen uniformly reducing concurrent finalization of the adjacent partitions.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin requested a review from a team as a code owner November 7, 2025 18:27
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the finalization logic in the hash shuffle operator to randomly sample partitions instead of processing them sequentially. This is a valuable change that should help distribute the finalization load more evenly across the cluster and avoid potential node hotspots. My review includes one suggestion to optimize the new sampling logic for better performance.

Comment on lines +910 to 912
target_partition_ids = random.sample(
list(self._pending_finalization_partition_ids), next_batch_size
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The random.sample function can operate directly on sets, so converting self._pending_finalization_partition_ids to a list is unnecessary. Removing the list() conversion will improve performance by avoiding the creation of a new list in each call, which can be expensive if the number of pending partitions is large.

Suggested change
target_partition_ids = random.sample(
list(self._pending_finalization_partition_ids), next_batch_size
)
target_partition_ids = random.sample(
self._pending_finalization_partition_ids, next_batch_size
)

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Nov 7, 2025
# and avoid effect of "sliding lense" effect where we finalize the batch of
# N *adjacent* partitions that may be co-located on the same node:
#
# - Adjacent partitions i and i+1 are handled by adjacent
Copy link
Contributor

@iamjustinhsu iamjustinhsu Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait is this true? if module N = num actors, then partition i and i + 1 must necessarily be in different actors. Oh wait nvm, i see what your saying

@alexeykudinkin alexeykudinkin added the go add ONLY when ready to merge, run all tests label Nov 7, 2025
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin enabled auto-merge (squash) November 7, 2025 20:02
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@github-actions github-actions bot disabled auto-merge November 7, 2025 20:33
Comment on lines +907 to +908
# - Adjacent aggregators have high likelihood of running on the
# same node (when num aggregators > num nodes)
Copy link
Contributor

@iamjustinhsu iamjustinhsu Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessarily true? your default strategy is spread, and each aggregator is scheduled with same num of resources, so aggregator i and i + 1 have as much of a chance of scheduling on the same node as aggregator i and j. please correct my assumptions if im wrong

#
# NOTE: This doesn't affect determinism, since this only impacts order
# of finalization (hence not required to be seeded)
target_partition_ids = random.sample(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So wouldn't a better strategy be to check how much each agg actor is currently consuming relative to the node's capacity and schedule the finalization if there's remaining capacity?

I just find the randomization strategy harder to reason in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's a function of partition size, so ideally if we can get metadata about the partition before scheduling the finalize() that would be even better.

@alexeykudinkin alexeykudinkin merged commit f4d10b8 into master Nov 7, 2025
6 checks passed
@alexeykudinkin alexeykudinkin deleted the ak/jn-rnd-fix branch November 7, 2025 22:18
YoussefEssDS pushed a commit to YoussefEssDS/ray that referenced this pull request Nov 8, 2025
…tion on a single node (ray-project#58456)

> Thank you for contributing to Ray! 🚀
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> 💡 Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description

Currently, finalization is scheduled in batches sequentially -- ie batch
of N adjacent partitions is finalized at once (in a sliding window).

This creates a lensing effect since:

1. Adjacent partitions i and i+1 get scheduled onto adjacent aggregators
j and j+i (since membership is determined as j = i % num_aggregators)
2. Adjacent aggregators have high likelihood of getting scheduled on the
same node (due to similarly being scheduled at about the same time in
sequence)

To address that this change applies random sampling when choosing next
partitions to finalize to make sure partitions are chosen uniformly
reducing concurrent finalization of the adjacent partitions.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…tion on a single node (ray-project#58456)

> Thank you for contributing to Ray! 🚀
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> 💡 Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description

Currently, finalization is scheduled in batches sequentially -- ie batch
of N adjacent partitions is finalized at once (in a sliding window).

This creates a lensing effect since:

1. Adjacent partitions i and i+1 get scheduled onto adjacent aggregators
j and j+i (since membership is determined as j = i % num_aggregators)
2. Adjacent aggregators have high likelihood of getting scheduled on the
same node (due to similarly being scheduled at about the same time in
sequence)

To address that this change applies random sampling when choosing next
partitions to finalize to make sure partitions are chosen uniformly
reducing concurrent finalization of the adjacent partitions.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…tion on a single node (ray-project#58456)

> Thank you for contributing to Ray! 🚀
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> 💡 Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description

Currently, finalization is scheduled in batches sequentially -- ie batch
of N adjacent partitions is finalized at once (in a sliding window).

This creates a lensing effect since:

1. Adjacent partitions i and i+1 get scheduled onto adjacent aggregators
j and j+i (since membership is determined as j = i % num_aggregators)
2. Adjacent aggregators have high likelihood of getting scheduled on the
same node (due to similarly being scheduled at about the same time in
sequence)

To address that this change applies random sampling when choosing next
partitions to finalize to make sure partitions are chosen uniformly
reducing concurrent finalization of the adjacent partitions.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…tion on a single node (ray-project#58456)

> Thank you for contributing to Ray! 🚀
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> 💡 Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description

Currently, finalization is scheduled in batches sequentially -- ie batch
of N adjacent partitions is finalized at once (in a sliding window).

This creates a lensing effect since:

1. Adjacent partitions i and i+1 get scheduled onto adjacent aggregators
j and j+i (since membership is determined as j = i % num_aggregators)
2. Adjacent aggregators have high likelihood of getting scheduled on the
same node (due to similarly being scheduled at about the same time in
sequence)

To address that this change applies random sampling when choosing next
partitions to finalize to make sure partitions are chosen uniformly
reducing concurrent finalization of the adjacent partitions.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

4 participants