Skip to content

Conversation

@pseudo-rnd-thoughts
Copy link
Member

Description

The MultiAgentEnvRunner would previously call the callback twice for the final episode of a batch (when sampling a fixed number of episodes). This PR fixes this problem ensuring that the callback only happens once for finished episode

Related issues

Closes #55452

Mark Towers added 2 commits November 23, 2025 13:38
… final episode when sampling a fixed number of episodes

Signed-off-by: Mark Towers <mark@anyscale.com>
Signed-off-by: Mark Towers <mark@anyscale.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves an issue where a connector callback was being invoked twice for the final episode in a batch. The fix is well-implemented by conditionally skipping the on_episode_created callback for the transient, newly created episode that replaces the final completed one. The added test case, which uses a custom EpisodeTracker connector, is a great way to verify the fix and ensure the callback is only triggered once per completed episode. I have one minor suggestion to improve the test code's style.

@ray-gardener ray-gardener bot added the rllib RLlib related issues label Nov 24, 2025
Signed-off-by: Mark Towers <mark@anyscale.com>
@pseudo-rnd-thoughts pseudo-rnd-thoughts added the go add ONLY when ready to merge, run all tests label Nov 24, 2025
env_index,
episodes,
call_on_episode_created=(eps != num_episodes),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create a new episode here, but whenever we call MultiAgentEnvRunner._sample() with num_episodes, we enter the if-clause in l. 272, recreating the episode, right?

Copy link
Member Author

@pseudo-rnd-thoughts pseudo-rnd-thoughts Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncertain about this detail as well.
In short, because of moving the if statement, then a new episode is created however never used because the finished episodes are completed. This is true, even if you sample again (for episodes) as all the environments are reset and you lose the episode. Therefore, to avoid the user thinking that a new episode was created when its never used, stepped into, etc I thought a callback might confuse users.

However I've just realised that if you sample N episodes, then M timesteps, this episode will be used. But I believe this would be the only case when that happens and I don't think there is any testing about env-runner behaviour for mixing sampling timesteps and episodes

The more I think about it, the more I think I should remove this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhhhh ok.
I think we need to spend more time on this but this PR appears to definitely be an improvement over what we have.

@ArturNiederfahrenhorst ArturNiederfahrenhorst merged commit 5458c75 into ray-project:master Nov 25, 2025
7 checks passed
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
… completed episodes when sampling a fixed number of episodes (ray-project#58931)

## Description
The `MultiAgentEnvRunner` would previously call the callback twice for
the final episode of a batch (when sampling a fixed number of episodes).
This PR fixes this problem ensuring that the callback only happens once
for finished episode

## Related issues
Closes ray-project#55452

---------

Signed-off-by: Mark Towers <mark@anyscale.com>
Co-authored-by: Mark Towers <mark@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests rllib RLlib related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Rllib] MultiAgentEnvRunner in episodes mode calls connectors one time too many.

2 participants