[data] add a comment explaining the bundling behavior for map_batches…

… with default batch_size (ray-project#47433) When batch_size is not set, input blocks are will be not bundled up. Add a comment explaining this. See ray-project#29971 and ray-project#47363 (comment) Signed-off-by: Hao Chen <chenh1024@gmail.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare-27 · Oct 12, 2024 · 545d1b3 · 545d1b3
1 parent e00e6c1
commit 545d1b3
Showing 1 changed file with 7 additions and 0 deletions.
diff --git a/python/ray/data/dataset.py b/python/ray/data/dataset.py
@@ -549,6 +549,13 @@ def __call__(self, batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:
             specified ``batch_size`` if ``batch_size`` doesn't evenly divide the
             block(s) sent to a given map task.
 
+            If ``batch_size`` is set and each input block is smaller than the
+            ``batch_size``, Ray Data will bundle up many blocks as the input for one
+            task, until their total size is equal to or greater than the given
+            ``batch_size``.
+            If ``batch_size`` is not set, the bundling will not be performed. Each task
+            will receive only one input block.
+
         .. seealso::
 
             :meth:`~Dataset.iter_batches`