Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; docs] RLlib docs redo: New API stack Episodes (SingleAgentEpisode). #46985

Merged
merged 10 commits into from
Aug 7, 2024

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Aug 6, 2024

RLlib docs redo: New API stack Episodes (SingleAgentEpisode).

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We are moving forward here. Very nice docs for the complex episodes. Amazing how simpel and clear this is described @sven1977!!


# Get the very first observation ("reset observation"). Note that a single observation
# is returned here (not a list of size 1 or a batch of size 1).
check(episode.get_observations(0), "obs_0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!! With this we can be sure its the observation at timestep 0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome diagrams. Makes the lookback buffer cristal clear!

Only thing: Describe the length of the lookback in this case :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice! I think this is well understandable.

:py:class:`~ray.rllib.env.single_agent_episode.SingleAgentEpisode` for single-agent setups
and :py:class:`~ray.rllib.env.multi_agent_episode.MultiAgentEpisode` for multi-agent setups.
The data is translated from this `Episode` format to tensor batches (including a possible move to the GPU)
only immediately before a neural network forward pass.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we add here that this happens inside of the connector pipeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

:end-before: rllib-sa-episode-03-end


To illustrate the differences between the data stored in a non-finalized episode vs the same data stored in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vs -> vs. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


To illustrate the differences between the data stored in a non-finalized episode vs the same data stored in
a finalized one, take a look at this complex observation example here, showing the exact same observation data in two
episodes (one not finalized the other finalized):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not finalized -> non-finalized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


**Complex observations in a finalized episode**: The entire observation record is a single (complex) dict matching the
gym environment's observation space. At the leafs of the structure are `np.NDArrays` holding the individual values of the leaf.
Note that these `NDArrays` have an extra batch dim (axis=0), whose length matches the length of the episode stored (here 3).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either both np.NDArray or both NDArray I'd say. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


Another useful getter argument (besides `fill`) is the `neg_index_as_lookback` boolean argument.
If set to True, negative indices are not interpreted as "from the end", but as
"into the lookback buffer". This allows you to loop over a range of global timesteps
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tiny tiny bit ambiguous when the "lookback" is actually not into the "lookback buffer" but just backwards from the considered global timestep. So, precisely it would be "from the global timestep (if needed into the lookback buffer)", wouldn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine.

Note that we do not support indexing by global (actual gym episode) timestep yet. It's probably a usecase that's never really needed.

So if you have an episode chunk from global timestep 100 to global timestep 200, as a user, you would normally just loop through it (from local_index 0 to local_index 100) and then - at each local_index - say: I want to do 4x-framestacking from local_index - 3 to local_index + 1. This means that if local_index is 0 or 1 or 2, you'd have a negative start index and for these few cases you want to tell the getter that these negative indices indeed mean "go into the lookback buffer" instead of "count from the end".

In other words: If you want to index into the lookback buffer, you have to provide a negative number, b/c index=0 is always the first item after(!) the lookback buffer.

# range. Set to 0 (beginning of lookback buffer) if result is a negative
# index.
if neg_indices_left_of_zero:
if neg_index_as_lookback:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change! This is smoother and sounds more correct :)

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) August 7, 2024 10:24
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Aug 7, 2024
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@github-actions github-actions bot disabled auto-merge August 7, 2024 11:25
@sven1977 sven1977 enabled auto-merge (squash) August 7, 2024 12:37
@sven1977 sven1977 merged commit a9a1f0a into ray-project:master Aug 7, 2024
6 checks passed
dev-goyal pushed a commit to dev-goyal/ray that referenced this pull request Aug 8, 2024
@sven1977 sven1977 deleted the docs_redo_episodes branch August 12, 2024 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants