Skip to content

Commit

Permalink
[Data] Fix incorrect pending task size if outputs are empty (#47604)
Browse files Browse the repository at this point in the history
If an operator outputs empty blocks, then Ray Data thinks that the
operator has 256 MiB of pending task outputs, even though it should be
0. For example:
```python
import pyarrow as pa
output = pa.Table.from_pydict({"data": [None] * 128})
assert output.nbytes == 0, output.nbytes
```

The reason for the bug is because we check if `average_bytes_per_output`
is truthy rather than if it's not `None`.

https://github.com/ray-project/ray/blob/1f83fb44580e392ba6d39a9e79bbdd8cd5b7d916/python/ray/data/_internal/execution/interfaces/op_runtime_metrics.py#L369-L371
---

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
  • Loading branch information
bveeramani authored Sep 11, 2024
1 parent 74f29fc commit 029ff4d
Showing 1 changed file with 3 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -366,9 +366,9 @@ def obj_store_mem_max_pending_output_per_task(self) -> Optional[float]:
if context._max_num_blocks_in_streaming_gen_buffer is None:
return None

bytes_per_output = (
self.average_bytes_per_output or context.target_max_block_size
)
bytes_per_output = self.average_bytes_per_output
if bytes_per_output is None:
bytes_per_output = context.target_max_block_size

num_pending_outputs = context._max_num_blocks_in_streaming_gen_buffer
if self.average_num_outputs_per_task is not None:
Expand Down

0 comments on commit 029ff4d

Please sign in to comment.