-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; docs] RLlib docs redo: New API stack Episodes (SingleAgentEpisode). #46985
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We are moving forward here. Very nice docs for the complex episodes. Amazing how simpel and clear this is described @sven1977!!
|
||
# Get the very first observation ("reset observation"). Note that a single observation | ||
# is returned here (not a list of size 1 or a batch of size 1). | ||
check(episode.get_observations(0), "obs_0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!! With this we can be sure its the observation at timestep 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome diagrams. Makes the lookback buffer cristal clear!
Only thing: Describe the length of the lookback in this case :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice! I think this is well understandable.
:py:class:`~ray.rllib.env.single_agent_episode.SingleAgentEpisode` for single-agent setups | ||
and :py:class:`~ray.rllib.env.multi_agent_episode.MultiAgentEpisode` for multi-agent setups. | ||
The data is translated from this `Episode` format to tensor batches (including a possible move to the GPU) | ||
only immediately before a neural network forward pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we add here that this happens inside of the connector pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
:end-before: rllib-sa-episode-03-end | ||
|
||
|
||
To illustrate the differences between the data stored in a non-finalized episode vs the same data stored in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vs -> vs. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
To illustrate the differences between the data stored in a non-finalized episode vs the same data stored in | ||
a finalized one, take a look at this complex observation example here, showing the exact same observation data in two | ||
episodes (one not finalized the other finalized): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not finalized -> non-finalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
**Complex observations in a finalized episode**: The entire observation record is a single (complex) dict matching the | ||
gym environment's observation space. At the leafs of the structure are `np.NDArrays` holding the individual values of the leaf. | ||
Note that these `NDArrays` have an extra batch dim (axis=0), whose length matches the length of the episode stored (here 3). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either both np.NDArray
or both NDArray
I'd say. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
Another useful getter argument (besides `fill`) is the `neg_index_as_lookback` boolean argument. | ||
If set to True, negative indices are not interpreted as "from the end", but as | ||
"into the lookback buffer". This allows you to loop over a range of global timesteps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tiny tiny bit ambiguous when the "lookback" is actually not into the "lookback buffer" but just backwards from the considered global timestep. So, precisely it would be "from the global timestep (if needed into the lookback buffer)", wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine.
Note that we do not support indexing by global (actual gym episode) timestep yet. It's probably a usecase that's never really needed.
So if you have an episode chunk from global timestep 100 to global timestep 200, as a user, you would normally just loop through it (from local_index
0 to local_index
100) and then - at each local_index
- say: I want to do 4x-framestacking from local_index - 3
to local_index + 1
. This means that if local_index is 0 or 1 or 2, you'd have a negative start index and for these few cases you want to tell the getter that these negative indices indeed mean "go into the lookback buffer" instead of "count from the end".
In other words: If you want to index into the lookback buffer, you have to provide a negative number, b/c index=0
is always the first item after(!) the lookback buffer.
# range. Set to 0 (beginning of lookback buffer) if result is a negative | ||
# index. | ||
if neg_indices_left_of_zero: | ||
if neg_index_as_lookback: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change! This is smoother and sounds more correct :)
…sode). (ray-project#46985) Signed-off-by: Dev <dev.goyal@hinge.co>
RLlib docs redo: New API stack Episodes (SingleAgentEpisode).
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.