Skip to content

Commit

Permalink
[Docs][Data] Ordering of rows (ray-project#50986)
Browse files Browse the repository at this point in the history
## Why are these changes needed?

Adds a section about impact of operators on ordering of rows to docs.

## Related issue number

Closes ray-project#50890 

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
   - [x] Built docs locally and verified the format and the links

---------

Signed-off-by: jakac <matej.jakimov@gmail.com>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
  • Loading branch information
2 people authored and elimelt committed Mar 9, 2025
1 parent 055a375 commit 892347d
Showing 1 changed file with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions doc/source/data/transforming-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ This guide shows you how to:

* :ref:`Transform rows <transforming_rows>`
* :ref:`Transform batches <transforming_batches>`
* :ref:`Ordering of rows <ordering_of_rows>`
* :ref:`Stateful transforms <stateful_transforms>`
* :ref:`Groupby and transform groups <transforming_groupby>`

Expand Down Expand Up @@ -218,6 +219,26 @@ NumPy functions and model inference. However, if your batch size is too large, y
program might run out of memory. If you encounter an out-of-memory error, decrease your
``batch_size``.

.. _ordering_of_rows:

Ordering of rows
================

When transforming data, the order of :ref:`blocks <data_key_concepts>` isn't preserved by default.

If the order of blocks needs to be preserved/deterministic,
you can use :meth:`~ray.data.Dataset.sort` method, or set :attr:`ray.data.ExecutionOptions.preserve_order` to `True`.
Note that setting this flag may negatively impact performance on larger cluster setups where stragglers are more likely.

.. testcode::

import ray

ctx = ray.data.DataContext().get_current()

# By default, this is set to False.
ctx.execution_options.preserve_order = True

.. _stateful_transforms:

Stateful Transforms
Expand Down

0 comments on commit 892347d

Please sign in to comment.