[RLlib] New API stack: (Multi)RLModule overhaul vol 03 (Introduce generic `_forward` to further simplify the user experience). #47889

sven1977 · 2024-10-03T17:15:11Z

New API stack: (Multi)RLModule overhaul vol 03 (Introduce generic _forward to further simplify the user experience).

Adds a generic _forward method to be used by RLModule subclasses (by default, all _forward_[inference|exploration|train] call this)
Users can still override _forward_[inference|exploration|train] to individualize behavior for the different algo phases.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. Some nits in the docstrings.

simonsays1980 · 2024-10-03T18:05:04Z

rllib/algorithms/ppo/torch/ppo_torch_learner.py

-        action_dist_class_exploration = (
-            self.module[module_id].unwrapped().get_exploration_action_dist_cls()
-        )
+        action_dist_class_train = module.get_train_action_dist_cls()


Don't we need to use unwrapped in case DDP is used?

Great question! DDP already wraps this method to use the unwrapped underlying RLModule, so this is ok here.

simonsays1980 · 2024-10-03T18:07:51Z

rllib/algorithms/ppo/torch/ppo_torch_learner.py

@@ -91,12 +89,14 @@ def possibly_masked_mean(data_):

        # Compute a value function loss.
        if config.use_critic:
-            value_fn_out = fwd_out[Columns.VF_PREDS]
+            value_fn_out = module.compute_values(


I wonder if this gives again problems in the DDP case. I remember similar problems with CQL and SAC when not running everything in forward_train, but I guess the problem was that forward_train was run multiple times. So, my guess: works here.

Yeah, good point, I think you are right. Let's see what the tests say ...

simonsays1980 · 2024-10-03T18:09:57Z

rllib/algorithms/ppo/torch/ppo_torch_rl_module.py

        encoder_outs = self.encoder(batch)
+        output[Columns.FEATURES] = encoder_outs[ENCODER_OUT][CRITIC]


Imo features is a misleading term here as features are usually the inputs to a neural network or model in general. embeddings might fit better.

You are right!
Changed everywhere to Columns.EMBEDDINGS and argument name: compute_values(self, batch, embedding=None).

simonsays1980 · 2024-10-03T18:11:11Z

rllib/algorithms/ppo/torch/ppo_torch_rl_module.py

+        batch: Dict[str, Any],
+        features: Optional[Any] = None,
+    ) -> TensorType:
+        if features is None:


Why not using features in batch and instead passing it in as an extra argument?

Good question. This would mean that we would have to change the batch (add a new key to it) during the update procedure, which might clash when we have to (torch) compile this operation. We had the same problem with tf-static graph.
Also, design-wise, I think it's cleaner not to change the batch after it comes out of a connector pipeline. Separation of concerns: Only connector pipelines are ever allowed to write to a batch:

connector -> train_batch # <- read-only from here on fwd_out = rl_module.forward_train(train_batch) losses = rl_module.compute_losses(train_batch, fwd_out)

simonsays1980 · 2024-10-04T13:53:17Z

rllib/core/rl_module/multi_rl_module.py

-            batch: The batch of multi-agent data (i.e. mapping from module ids to
-                individual modules' batches).
+    def items(self) -> ItemsView[ModuleID, RLModule]:
+        """Returns a keys view over the module IDs in this MultiRLModule."""


"keys" -> "items"

simonsays1980 · 2024-10-04T13:53:31Z

rllib/core/rl_module/multi_rl_module.py

-    ) -> Union[Dict[str, Any], Dict[ModuleID, Dict[str, Any]]]:
-        """Runs the forward_exploration pass.
+    def values(self) -> ValuesView[ModuleID]:
+        """Returns a keys view over the module IDs in this MultiRLModule."""


"keys" -> "values"

Great catch! Fixed for values() as well.

simonsays1980 · 2024-10-04T13:55:41Z

rllib/core/rl_module/rl_module.py

-        By default, RLlib assumes that the module is non-recurrent if the initial
-        state is an empty dict and recurrent otherwise.
-        This behavior can be overridden by implementing this method.
+        Note that RLlib's distribution classes all implement the `Distribution`


Very nice! This makes it clear why!

simonsays1980 · 2024-10-04T13:58:02Z

rllib/examples/rl_modules/classes/lstm_containing_rlm.py

-        values = self._values(features).squeeze(-1)
+        # Same logic as _forward, but also return features to be used by value function
+        # branch during training.
+        features, state_outs = self._compute_features_and_state_outs(batch)


As before, in my very own opinion I think "features" is a misleading name as it is usually used for the inputs of a neural network.

Fixed everywhere. Great catch and suggestion! Makes things much clearer.

…odule_do_over_bc_default_module_03_common_forward

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…eric `_forward` to further simplify the user experience). (ray-project#47889) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

smanolloff · 2024-10-24T08:03:26Z

rllib/algorithms/ppo/torch/ppo_torch_learner.py

-            self.module[module_id].unwrapped().get_exploration_action_dist_cls()
-        )
+        action_dist_class_train = module.get_train_action_dist_cls()
+        action_dist_class_exploration = module.get_exploration_action_dist_cls()


Hey, I have a question here: shouldn't exploration or inference dist be used? In a similar fashion to GetActions connector's logic?

This affects KL loss calculation which might end up using a different distribution class (exploration_dist) than the one used for the surrogate loss (inference_dist). It is somewhat an edge case since the two are actually the same as per TorchRLModule, but users sub-classing it would be unaware.

sven1977 added 2 commits October 3, 2024 17:39

wip

103eb20

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

2d99364

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from ArturNiederfahrenhorst and simonsays1980 as code owners October 3, 2024 17:15

sven1977 assigned simonsays1980 Oct 3, 2024

sven1977 added 2 commits October 3, 2024 22:24

wip

d85d95e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

9196ca2

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 approved these changes Oct 4, 2024

View reviewed changes

sven1977 added 3 commits October 4, 2024 22:55

Merge branch 'master' of https://github.com/ray-project/ray into rl_m…

5d32009

…odule_do_over_bc_default_module_03_common_forward

Merge branch 'master' of https://github.com/ray-project/ray into rl_m…

26e14e2

…odule_do_over_bc_default_module_03_common_forward

wip

1a08119

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) October 5, 2024 12:21

github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 5, 2024

wip

717f9e8

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge October 5, 2024 12:29

wip

8a009a9

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) October 5, 2024 13:51

fix

be89e68

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge October 5, 2024 14:01

sven1977 enabled auto-merge (squash) October 5, 2024 14:30

fix

9ac093b

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge October 5, 2024 16:31

sven1977 enabled auto-merge (squash) October 5, 2024 17:48

fix

81da5de

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge October 5, 2024 21:29

sven1977 enabled auto-merge (squash) October 5, 2024 21:52

sven1977 merged commit e182e19 into ray-project:master Oct 5, 2024
6 checks passed

sven1977 deleted the rl_module_do_over_bc_default_module_03_common_forward branch October 6, 2024 06:16

smanolloff reviewed Oct 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] New API stack: (Multi)RLModule overhaul vol 03 (Introduce generic `_forward` to further simplify the user experience). #47889

[RLlib] New API stack: (Multi)RLModule overhaul vol 03 (Introduce generic `_forward` to further simplify the user experience). #47889

sven1977 commented Oct 3, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 Oct 3, 2024

sven1977 Oct 5, 2024

simonsays1980 Oct 3, 2024

sven1977 Oct 5, 2024

simonsays1980 Oct 3, 2024

sven1977 Oct 5, 2024

simonsays1980 Oct 3, 2024

sven1977 Oct 5, 2024

simonsays1980 Oct 4, 2024

simonsays1980 Oct 4, 2024

sven1977 Oct 5, 2024

simonsays1980 Oct 4, 2024

simonsays1980 Oct 4, 2024

sven1977 Oct 5, 2024

smanolloff Oct 24, 2024 •

edited

Loading

		encoder_outs = self.encoder(batch)
		output[Columns.FEATURES] = encoder_outs[ENCODER_OUT][CRITIC]

[RLlib] New API stack: (Multi)RLModule overhaul vol 03 (Introduce generic _forward to further simplify the user experience). #47889

[RLlib] New API stack: (Multi)RLModule overhaul vol 03 (Introduce generic _forward to further simplify the user experience). #47889

Conversation

sven1977 commented Oct 3, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smanolloff Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

[RLlib] New API stack: (Multi)RLModule overhaul vol 03 (Introduce generic `_forward` to further simplify the user experience). #47889

[RLlib] New API stack: (Multi)RLModule overhaul vol 03 (Introduce generic `_forward` to further simplify the user experience). #47889

sven1977 commented Oct 3, 2024 •

edited

Loading

smanolloff Oct 24, 2024 •

edited

Loading