Skip to content

Conversation

@YoussefEssDS
Copy link
Contributor

Description

This PR will:

  • Batch block resolution in resolve_block_refs() so iter_batches() issues one ray.get() per chunk of block refs instead of per ref. The chunk size is configurable using new DataContext.iter_get_block_batch_size knob.
  • Added a test that proves that resolve_block_refs() actually batches the ray.get() calls.

Related issues

Raised by @amogkam in python/ray/data/_internal/block_batching/util.py

Additional information

Simple benchmark available: https://gist.github.com/YoussefEssDS/40de959a42a19334b8dac8bd217c319b

Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
@YoussefEssDS YoussefEssDS requested a review from a team as a code owner November 8, 2025 00:22
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces batching for resolve_block_refs to improve the performance of iter_batches by reducing the number of ray.get() calls. The batch size is made configurable through a new DataContext setting. The implementation is sound and includes a good test case to verify the batching behavior. I have one suggestion to improve code conciseness by using yield from.

Comment on lines 78 to 85
for block_ref in block_ref_iter:
pending.append(block_ref)
if len(pending) >= batch_size:
for block in _resolve_pending():
yield block

for block in _resolve_pending():
yield block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for yielding blocks from _resolve_pending is duplicated. You can simplify this by using yield from to make the code more concise and avoid repetition.

Suggested change
for block_ref in block_ref_iter:
pending.append(block_ref)
if len(pending) >= batch_size:
for block in _resolve_pending():
yield block
for block in _resolve_pending():
yield block
for block_ref in block_ref_iter:
pending.append(block_ref)
if len(pending) >= batch_size:
yield from _resolve_pending()
yield from _resolve_pending()

Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Nov 8, 2025
Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
if batch_size is None or current_window_size < num_rows_to_prefetch:
try:
next_ref_bundle = get_next_ref_bundle()
next_ref_bundle = next(ref_bundles)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: RefBundle Retrieval Observability Gap

The removal of the get_next_ref_bundle() helper function eliminates tracking of stats.iter_get_ref_bundles_s timing metrics. The direct calls to next(ref_bundles) at lines 371 and 384 no longer wrap the operation with the stats timer, causing loss of observability for RefBundle retrieval time which was previously tracked and reported in iteration statistics.

Fix in Cursor Fix in Web

YoussefEssDS and others added 2 commits November 11, 2025 15:17
Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Phantom Constant Breaks Imports

The constant DEFAULT_ACTOR_MAX_TASKS_IN_FLIGHT_TO_MAX_CONCURRENCY_FACTOR is removed but it's still imported and used in actor_pool_map_operator.py and test_operators.py. This will cause an ImportError when those modules try to import this constant from ray.data.context.

python/ray/data/context.py#L217-L221

)
# Enable per node metrics reporting for Ray Data, disabled by default.
DEFAULT_ENABLE_PER_NODE_METRICS = bool(
int(os.environ.get("RAY_DATA_PER_NODE_METRICS", "0"))
)

Fix in Cursor Fix in Web


Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
Copy link
Contributor

@srinathk10 srinathk10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YoussefEssDS Motivation for the changes look good. Please address review comments.

Also w.r.t your micro-benchmark, please add the results as comment here describing your test setup.

Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
self._eager_free = clear_block_after_read and ctx.eager_free
max_get_blocks_batch_size = max(1, (prefetch_batches or 0) + 1)
self._block_get_batch_size = min(
ctx.iter_get_block_batch_size, max_get_blocks_batch_size
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Overly Conservative Batching Limits Performance

The calculation of _block_get_batch_size overly restricts batching by limiting it to prefetch_batches + 1 blocks. With default settings (prefetch_batches=1, iter_get_block_batch_size=32), this results in batching only 2 blocks at a time instead of the configured 32, significantly reducing the performance benefit. The formula max(1, (prefetch_batches or 0) + 1) creates a cap that's too conservative since prefetch_batches measures batches (not blocks), and their relationship varies with block size. This causes the configured iter_get_block_batch_size to be silently overridden in most cases.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relaxing that cap breaks the backpressure tests, as it forces the materialization of more blocks than the configured prefetch size.

@YoussefEssDS
Copy link
Contributor Author

Hi @srinathk10 thanks for the review. I ran the microbenchmark on a ryzen 9 7950X / 64 GB RAM machine (Ubuntu 22.04, Python 3.12)

python resolve_block_refs_benchmark.py \
  --num-rows 5_000_000 \
  --num-blocks 512 \
  --batch-size 1024 \
  --prefetch-batches 32 \
  --repetitions 3

Before batching change: mean 3.82 s (p50 3.83 s, min 3.78 s, max 3.86 s) = 1.31 M rows/s over 4 883 batches.
After batching change: mean 3.63 s (p50 3.64 s, min 3.61 s, max 3.64 s) =1.38 M rows/s over 4 883 batches.

Net improvement ~5% in end-to-end batch iteration throughput with prefetch set to 32. Both runs used the same script and dataset parameters on the same machine.

Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
@srinathk10
Copy link
Contributor

Train release tests: https://buildkite.com/ray-project/release/builds/67245

Copy link
Contributor

@srinathk10 srinathk10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

clear_block_after_read and DataContext.get_current().eager_free
ctx = DataContext.get_current()
self._eager_free = clear_block_after_read and ctx.eager_free
max_get_blocks_batch_size = max(1, (prefetch_batches or 0) + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefetch_batches is the number of batches to prefetch, not blocks.
The actual number of blocks to prefetch is calculated in BlockPrefecther.
We can add a method to let it report the number of blocks being prefetched.

hits += current_hit
misses += current_miss
unknowns += current_unknown
ctx = ray.data.context.DataContext.get_current()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass in the correct context object.
avoid using the global one.

Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
@YoussefEssDS
Copy link
Contributor Author

hi @raulchen is this what you had in mind? any further suggestions?

@YoussefEssDS
Copy link
Contributor Author

Hi @raulchen PTAL, Thanks!

block_ref_iter: Iterator[ObjectRef[Block]],
stats: Optional[DatasetStats] = None,
max_get_batch_size: Optional[Union[int, Callable[[], int]]] = None,
ctx: Optional["DataContext"] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this mandatory?

self._eager_free = (
clear_block_after_read and DataContext.get_current().eager_free
)
self._ctx = DataContext.get_current()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally this ctx should be passed from Dataset._context.
but since it's an existing issue, you can leave a TODO here if it requires a massive change.

@YoussefEssDS
Copy link
Contributor Author

@raulchen PTAL. thanks!

@YoussefEssDS
Copy link
Contributor Author

Hi @raulchen , just bumping this. Can you check if any further changes are needed? Thanks!

@YoussefEssDS
Copy link
Contributor Author

Hi @bveeramani can we get this over the line? It's approved by the reviewers. Thanks!

@bveeramani bveeramani enabled auto-merge (squash) December 15, 2025 17:29
@bveeramani
Copy link
Member

@YoussefEssDS merged. Thank you for the contribution!

@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Dec 15, 2025
@bveeramani bveeramani merged commit 2a042d4 into ray-project:master Dec 15, 2025
8 checks passed
kriyanshii pushed a commit to kriyanshii/ray that referenced this pull request Dec 16, 2025
…ter_batches (ray-project#58467)

## Description
This PR will:

- Batch block resolution in `resolve_block_refs()` so `iter_batches()`
issues one `ray.get()` per chunk of block refs instead of per ref. The
chunk size is configurable using new
`DataContext.iter_get_block_batch_size` knob.
- Added a test that proves that `resolve_block_refs()` actually batches
the `ray.get()` calls.

## Related issues
Raised by @amogkam in `python/ray/data/_internal/block_batching/util.py`

## Additional information
Simple benchmark available:
https://gist.github.com/YoussefEssDS/40de959a42a19334b8dac8bd217c319b

---------

Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
Signed-off-by: kriyanshii <kriyanshishah06@gmail.com>
Yicheng-Lu-llll pushed a commit to Yicheng-Lu-llll/ray that referenced this pull request Dec 22, 2025
…ter_batches (ray-project#58467)

## Description
This PR will:

- Batch block resolution in `resolve_block_refs()` so `iter_batches()`
issues one `ray.get()` per chunk of block refs instead of per ref. The
chunk size is configurable using new
`DataContext.iter_get_block_batch_size` knob.
- Added a test that proves that `resolve_block_refs()` actually batches
the `ray.get()` calls.

## Related issues
Raised by @amogkam in `python/ray/data/_internal/block_batching/util.py`

## Additional information
Simple benchmark available:
https://gist.github.com/YoussefEssDS/40de959a42a19334b8dac8bd217c319b

---------

Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
iamjustinhsu pushed a commit to iamjustinhsu/ray that referenced this pull request Jan 8, 2026
…ter_batches (ray-project#58467)

## Description
This PR will:

- Batch block resolution in `resolve_block_refs()` so `iter_batches()`
issues one `ray.get()` per chunk of block refs instead of per ref. The
chunk size is configurable using new
`DataContext.iter_get_block_batch_size` knob.
- Added a test that proves that `resolve_block_refs()` actually batches
the `ray.get()` calls.

## Related issues
Raised by @amogkam in `python/ray/data/_internal/block_batching/util.py`

## Additional information
Simple benchmark available:
https://gist.github.com/YoussefEssDS/40de959a42a19334b8dac8bd217c319b

---------

Signed-off-by: YoussefEssDS <oyoussefesseddiq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants