Skip to content

[Rllib] MultiAgentEnvRunner in episodes mode calls connectors one time too many. #55452

@dennismalmgren

Description

@dennismalmgren

What happened + What you expected to happen

When running a MultiAgentEnvRunner in collect episodes-mode there's an early-out termination of a for loop in _sample:

                    # Also early-out if we reach the number of episodes within this
                    # for-loop.
                    if eps == num_episodes:
                        break

This causes the code to skip the creation of a new episode but it also causes done_episodes_to_run_env_to_module to be a subset of the episodes in episodes. Normally, they would be disjoint since episodes would have re-created episodes in those slots. This causes a problem because slightly later in the call, env-to-module connectors are called, first with done_episodes_to_run_env_to_module, then with episodes:

if done_episodes_to_run_env_to_module:
                    # Run the env-to-module connector pipeline for all done episodes.
                    # Note, this is needed to postprocess last-step data, e.g. if the
                    # user uses a connector that one-hot encodes observations.
                    # Note, this pipeline run is not timed as the number of episodes
                    # can differ from `num_envs_per_env_runner` and would bias time
                    # measurements.
                    self._env_to_module(
                        episodes=done_episodes_to_run_env_to_module,
                        explore=explore,
                        rl_module=self.module,
                        shared_data=shared_data,
                        metrics=None,
                    )
                self._cached_to_module = self._env_to_module(
                    episodes=episodes,
                    explore=explore,
                    rl_module=self.module,
                    shared_data=shared_data,
                    metrics=self.metrics,
                    metrics_prefix_key=(ENV_TO_MODULE_CONNECTOR,),
                )

This causes connectors to have to deal with information they have already processed, for the done episodes.

Versions / Dependencies

We use ray[rllib]==2.44.1 but the issue seems present on Github.

Reproduction script

See description above.

Issue Severity

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tcommunity-backlogrllibRLlib related issuesrllib-envrunnersIssues around the sampling backend of RLlibstability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions