[rllib] Prevent double calling connectors for `MultiAgentEnvRunner`'s completed episodes when sampling a fixed number of episodes #58931

pseudo-rnd-thoughts · 2025-11-24T10:37:02Z

Description

The MultiAgentEnvRunner would previously call the callback twice for the final episode of a batch (when sampling a fixed number of episodes). This PR fixes this problem ensuring that the callback only happens once for finished episode

Related issues

Closes #55452

… final episode when sampling a fixed number of episodes Signed-off-by: Mark Towers <mark@anyscale.com>

Signed-off-by: Mark Towers <mark@anyscale.com>

gemini-code-assist

Code Review

This pull request effectively resolves an issue where a connector callback was being invoked twice for the final episode in a batch. The fix is well-implemented by conditionally skipping the on_episode_created callback for the transient, newly created episode that replaces the final completed one. The added test case, which uses a custom EpisodeTracker connector, is a great way to verify the fix and ensure the callback is only triggered once per completed episode. I have one minor suggestion to improve the test code's style.

rllib/env/tests/test_multi_agent_env_runner.py

Signed-off-by: Mark Towers <mark@anyscale.com>

ArturNiederfahrenhorst · 2025-11-25T13:41:18Z

rllib/env/multi_agent_env_runner.py

+                        env_index,
+                        episodes,
+                        call_on_episode_created=(eps != num_episodes),
+                    )


We create a new episode here, but whenever we call MultiAgentEnvRunner._sample() with num_episodes, we enter the if-clause in l. 272, recreating the episode, right?

I'm uncertain about this detail as well.
In short, because of moving the if statement, then a new episode is created however never used because the finished episodes are completed. This is true, even if you sample again (for episodes) as all the environments are reset and you lose the episode. Therefore, to avoid the user thinking that a new episode was created when its never used, stepped into, etc I thought a callback might confuse users.

However I've just realised that if you sample N episodes, then M timesteps, this episode will be used. But I believe this would be the only case when that happens and I don't think there is any testing about env-runner behaviour for mixing sampling timesteps and episodes

The more I think about it, the more I think I should remove this

Mhhhh ok.
I think we need to spend more time on this but this PR appears to definitely be an improvement over what we have.

… completed episodes when sampling a fixed number of episodes (ray-project#58931) ## Description The `MultiAgentEnvRunner` would previously call the callback twice for the final episode of a batch (when sampling a fixed number of episodes). This PR fixes this problem ensuring that the callback only happens once for finished episode ## Related issues Closes ray-project#55452 --------- Signed-off-by: Mark Towers <mark@anyscale.com> Co-authored-by: Mark Towers <mark@anyscale.com>

Mark Towers added 2 commits November 23, 2025 13:38

[rllib] Prevent double calling connectors for MultiAgentEnvRunner's…

bf890c1

… final episode when sampling a fixed number of episodes Signed-off-by: Mark Towers <mark@anyscale.com>

Add test and run pre-commit

6f947ad

Signed-off-by: Mark Towers <mark@anyscale.com>

pseudo-rnd-thoughts requested a review from a team as a code owner November 24, 2025 10:37

pseudo-rnd-thoughts mentioned this pull request Nov 24, 2025

[Rllib] MultiAgentEnvRunner in episodes mode calls connectors one time too many. #55452

Closed

gemini-code-assist bot reviewed Nov 24, 2025

View reviewed changes

rllib/env/tests/test_multi_agent_env_runner.py Outdated Show resolved Hide resolved

ray-gardener bot added the rllib RLlib related issues label Nov 24, 2025

Cursor code review - assertEquals to assertEqual

a6368a2

Signed-off-by: Mark Towers <mark@anyscale.com>

pseudo-rnd-thoughts added the go add ONLY when ready to merge, run all tests label Nov 24, 2025

ArturNiederfahrenhorst reviewed Nov 25, 2025

View reviewed changes

ArturNiederfahrenhorst approved these changes Nov 25, 2025

View reviewed changes

ArturNiederfahrenhorst merged commit 5458c75 into ray-project:master Nov 25, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[rllib] Prevent double calling connectors for `MultiAgentEnvRunner`'s completed episodes when sampling a fixed number of episodes #58931

[rllib] Prevent double calling connectors for `MultiAgentEnvRunner`'s completed episodes when sampling a fixed number of episodes #58931

pseudo-rnd-thoughts commented Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

ArturNiederfahrenhorst Nov 25, 2025

Uh oh!

pseudo-rnd-thoughts Nov 25, 2025 •

edited

Loading

Uh oh!

ArturNiederfahrenhorst Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[rllib] Prevent double calling connectors for MultiAgentEnvRunner's completed episodes when sampling a fixed number of episodes #58931

[rllib] Prevent double calling connectors for MultiAgentEnvRunner's completed episodes when sampling a fixed number of episodes #58931

Conversation

pseudo-rnd-thoughts commented Nov 24, 2025

Description

Related issues

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ArturNiederfahrenhorst Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

pseudo-rnd-thoughts Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArturNiederfahrenhorst Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[rllib] Prevent double calling connectors for `MultiAgentEnvRunner`'s completed episodes when sampling a fixed number of episodes #58931

[rllib] Prevent double calling connectors for `MultiAgentEnvRunner`'s completed episodes when sampling a fixed number of episodes #58931

pseudo-rnd-thoughts Nov 25, 2025 •

edited

Loading