-
Notifications
You must be signed in to change notification settings - Fork 7k
[rllib] Prevent double calling connectors for MultiAgentEnvRunner's completed episodes when sampling a fixed number of episodes
#58931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Prevent double calling connectors for MultiAgentEnvRunner's completed episodes when sampling a fixed number of episodes
#58931
Conversation
… final episode when sampling a fixed number of episodes Signed-off-by: Mark Towers <mark@anyscale.com>
Signed-off-by: Mark Towers <mark@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively resolves an issue where a connector callback was being invoked twice for the final episode in a batch. The fix is well-implemented by conditionally skipping the on_episode_created callback for the transient, newly created episode that replaces the final completed one. The added test case, which uses a custom EpisodeTracker connector, is a great way to verify the fix and ensure the callback is only triggered once per completed episode. I have one minor suggestion to improve the test code's style.
Signed-off-by: Mark Towers <mark@anyscale.com>
| env_index, | ||
| episodes, | ||
| call_on_episode_created=(eps != num_episodes), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create a new episode here, but whenever we call MultiAgentEnvRunner._sample() with num_episodes, we enter the if-clause in l. 272, recreating the episode, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uncertain about this detail as well.
In short, because of moving the if statement, then a new episode is created however never used because the finished episodes are completed. This is true, even if you sample again (for episodes) as all the environments are reset and you lose the episode. Therefore, to avoid the user thinking that a new episode was created when its never used, stepped into, etc I thought a callback might confuse users.
However I've just realised that if you sample N episodes, then M timesteps, this episode will be used. But I believe this would be the only case when that happens and I don't think there is any testing about env-runner behaviour for mixing sampling timesteps and episodes
The more I think about it, the more I think I should remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhhhh ok.
I think we need to spend more time on this but this PR appears to definitely be an improvement over what we have.
… completed episodes when sampling a fixed number of episodes (ray-project#58931) ## Description The `MultiAgentEnvRunner` would previously call the callback twice for the final episode of a batch (when sampling a fixed number of episodes). This PR fixes this problem ensuring that the callback only happens once for finished episode ## Related issues Closes ray-project#55452 --------- Signed-off-by: Mark Towers <mark@anyscale.com> Co-authored-by: Mark Towers <mark@anyscale.com>
Description
The
MultiAgentEnvRunnerwould previously call the callback twice for the final episode of a batch (when sampling a fixed number of episodes). This PR fixes this problem ensuring that the callback only happens once for finished episodeRelated issues
Closes #55452