New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[RLlib; docs] RLlib docs redo: New API stack Episodes (SingleAgentEpisode). #46985

Merged

sven1977 merged 10 commits into ray-project:master from sven1977:docs_redo_episodes

Aug 7, 2024

+744 −563

Contributor

sven1977 commented Aug 6, 2024 •

edited

Loading

RLlib docs redo: New API stack Episodes (SingleAgentEpisode).

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

sven1977 added 7 commits

August 4, 2024 12:05

wip

5c31081

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

5c746c9

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

c67ed9e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

70546ac

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

1ff349b

Signed-off-by: sven1977 <svenmika1977@gmail.com>


          Merge branch 'master' of https://github.com/ray-project/ray into docs…

c7dc7ee

…_redo_episodes

wip

f45929e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from ArturNiederfahrenhorst, maxpumperla, simonsays1980 and a team as code owners

August 6, 2024 15:06

sven1977 assigned simonsays1980

simonsays1980 approved these changes

View reviewed changes

Collaborator

simonsays1980 left a comment

LGTM. We are moving forward here. Very nice docs for the complex episodes. Amazing how simpel and clear this is described @sven1977!!

doc/source/rllib/doc_code/sa_episode.py

+              # Get the very first observation ("reset observation"). Note that a single observation
+              # is returned here (not a list of size 1 or a batch of size 1).
+              check(episode.get_observations(0), "obs_0")

Collaborator

simonsays1980 Aug 6, 2024

Very nice!! With this we can be sure its the observation at timestep 0.

doc/source/rllib/images/episodes/sa_episode_cut_and_lookback.svg Outdated

Collaborator

simonsays1980 Aug 6, 2024

Awesome diagrams. Makes the lookback buffer cristal clear!

Only thing: Describe the length of the lookback in this case :)

Contributor Author

sven1977 Aug 7, 2024

fixed

doc/source/rllib/images/episodes/sa_episode_getters.svg Outdated

Collaborator

simonsays1980 Aug 6, 2024

Awesome!

doc/source/rllib/images/episodes/usage_of_episodes.svg Outdated

Collaborator

simonsays1980 Aug 6, 2024

Really nice! I think this is well understandable.

doc/source/rllib/single-agent-episode.rst Outdated

+              :py:class:`~ray.rllib.env.single_agent_episode.SingleAgentEpisode` for single-agent setups
+              and :py:class:`~ray.rllib.env.multi_agent_episode.MultiAgentEpisode` for multi-agent setups.
+              The data is translated from this `Episode` format to tensor batches (including a possible move to the GPU)
+              only immediately before a neural network forward pass.

Collaborator

simonsays1980 Aug 6, 2024

Maybe we add here that this happens inside of the connector pipeline?

Contributor Author

sven1977 Aug 7, 2024

done

doc/source/rllib/single-agent-episode.rst Outdated

		:end-before: rllib-sa-episode-03-end


		To illustrate the differences between the data stored in a non-finalized episode vs the same data stored in

Collaborator

simonsays1980 Aug 6, 2024

vs -> vs. :)

Contributor Author

sven1977 Aug 7, 2024

fixed

doc/source/rllib/single-agent-episode.rst Outdated

+              To illustrate the differences between the data stored in a non-finalized episode vs the same data stored in
+              a finalized one, take a look at this complex observation example here, showing the exact same observation data in two
+              episodes (one not finalized the other finalized):

Collaborator

simonsays1980 Aug 6, 2024

not finalized -> non-finalized?

Contributor Author

sven1977 Aug 7, 2024

fixed

doc/source/rllib/single-agent-episode.rst

+                  **Complex observations in a finalized episode**: The entire observation record is a single (complex) dict matching the
+                  gym environment's observation space. At the leafs of the structure are `np.NDArrays` holding the individual values of the leaf.
+                  Note that these `NDArrays` have an extra batch dim (axis=0), whose length matches the length of the episode stored (here 3).

Collaborator

simonsays1980 Aug 6, 2024

Either both np.NDArray or both NDArray I'd say. What do you think?

Contributor Author

sven1977 Aug 7, 2024

fixed

doc/source/rllib/single-agent-episode.rst

+              Another useful getter argument (besides `fill`) is the `neg_index_as_lookback` boolean argument.
+              If set to True, negative indices are not interpreted as "from the end", but as
+              "into the lookback buffer". This allows you to loop over a range of global timesteps

Collaborator

simonsays1980 Aug 6, 2024

This is a tiny tiny bit ambiguous when the "lookback" is actually not into the "lookback buffer" but just backwards from the considered global timestep. So, precisely it would be "from the global timestep (if needed into the lookback buffer)", wouldn't it?

Contributor Author

sven1977 Aug 7, 2024

I think this is fine.

Note that we do not support indexing by global (actual gym episode) timestep yet. It's probably a usecase that's never really needed.

So if you have an episode chunk from global timestep 100 to global timestep 200, as a user, you would normally just loop through it (from local_index 0 to local_index 100) and then - at each local_index - say: I want to do 4x-framestacking from local_index - 3 to local_index + 1. This means that if local_index is 0 or 1 or 2, you'd have a negative start index and for these few cases you want to tell the getter that these negative indices indeed mean "go into the lookback buffer" instead of "count from the end".

In other words: If you want to index into the lookback buffer, you have to provide a negative number, b/c index=0 is always the first item after(!) the lookback buffer.

rllib/env/utils/infinite_lookback_buffer.py

                           # range. Set to 0 (beginning of lookback buffer) if result is a negative
                           # index.
-                          if neg_indices_left_of_zero:
+                          if neg_index_as_lookback:

Collaborator

simonsays1980 Aug 6, 2024

Nice change! This is smoother and sounds more correct :)

sven1977 added 2 commits

August 7, 2024 11:34

wip

517d526

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

f3fc99e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash)

August 7, 2024 10:24

github-actions bot added the go label

wip

a3950f9

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge

August 7, 2024 11:25

sven1977 enabled auto-merge (squash)

August 7, 2024 12:37

sven1977 merged commit a9a1f0a into ray-project:master

6 checks passed

dev-goyal pushed a commit to dev-goyal/ray that referenced this pull request


          [RLlib; docs] RLlib docs redo: New API stack Episodes (SingleAgentEpi…

318da73

…sode). (ray-project#46985)

Signed-off-by: Dev <dev.goyal@hinge.co>

sven1977 deleted the docs_redo_episodes branch

August 12, 2024 13:48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go