Skip to content

Commit

Permalink
[data] add a comment explaining the bundling behavior for map_batches…
Browse files Browse the repository at this point in the history
… with default batch_size (ray-project#47433)

When batch_size is not set, input blocks are will be not bundled up.
Add a comment explaining this.
See ray-project#29971 and
ray-project#47363 (comment)

Signed-off-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
  • Loading branch information
raulchen authored and ujjawal-khare committed Oct 12, 2024
1 parent e00e6c1 commit 545d1b3
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions python/ray/data/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -549,6 +549,13 @@ def __call__(self, batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:
specified ``batch_size`` if ``batch_size`` doesn't evenly divide the
block(s) sent to a given map task.
If ``batch_size`` is set and each input block is smaller than the
``batch_size``, Ray Data will bundle up many blocks as the input for one
task, until their total size is equal to or greater than the given
``batch_size``.
If ``batch_size`` is not set, the bundling will not be performed. Each task
will receive only one input block.
.. seealso::
:meth:`~Dataset.iter_batches`
Expand Down

0 comments on commit 545d1b3

Please sign in to comment.