[RLlib] Chaining Models in RLModules #31469

ArturNiederfahrenhorst · 2023-01-05T16:38:42Z

Why are these changes needed?

After sketching Solution2 (this PR) and Solution 1, we have decided to go with this PR to pursue this solution further.

With this PR, we introduce a hierarchy of models that fits should be generated by the ModelCatalog.
This PR also removes vf_encoder and pi_encoders to divide the PPORLModule into a shared encoder and vf/pi.

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

kouroshHakha

I really like the direction this is headed to. I took a high-level look for 15 mins, so don't take this as my detailed review at all yet, but here is my feedback:

I really like that the setup of RLModules is coming down to calling bunch of config.build(framework=...) calls while the forward calls are also kept minimal and simple. So if we end of with a general and yet flexible abstraction for configs and encoder / trunks the current proposed API for the RLModule is super optimal in my head.
I think from_model_config() has a lot of components that will be recurring across many RLModules, e.g. how to parse model_config into encoder config and trunk configs. I think we need an extendible abstraction layer for these kinds of stuff. That is what I have from catalog in mind.

def from_model_config(...):
     catalog = Catalog.from_model_config(...)
     encoder_config = catalog.build_encoder_config()
     # a utility method that returns action_space.n or 2/1 * action_space.shape[0] or 
     action_dim = get_action_dim(action_space, free_log_std)
     pi_config = catalog.build_trunk_config(out_dim=action_dim)
     vf_config = catalog.build_trunk_config(out_dim=1)
     config_ = PPOModuleConfig(
            observation_space=observation_space,
            action_space=action_space,
            shared_encoder_config=encoder_config,
            pi_config=pi_config,
            vf_config=vf_config,
            free_log_std=free_log_std,
        )
        module = PPOTorchRLModule(config_)
        return module

A couple of things to notice here:

Catalog does not build neural networks, it will just return the pre-defined neural network configs which can be constructed at run time via a simple .build(framework) API.
I think Catalog should still be a deep module with generic api here such as build_encoder(), build_trunk() that have simple interfaces. There is a trade-off between making the interface of these apis simple vs. how deep they'll become. We need to find a sweet spot here.

I really love that the spec checking is delegated to the sub-components now. It will make them much cleaner and easier to build.
sub-modules should not inherit from base_model.Model. This class was an early version of modules with spec checking capability. I think we have a better version with the decorators in place now. For Encoders specifically, we can create a base interface class with get_initial_state() abstract API. FCEncoder(Encoder) will contain FCNet inside, and will override get_intial_state() to return empty dict. LSTMEncoder(Encoder) will contain LSTMCell, and will override get_initial_state() to return h, c values. For Trunks I think we can also create a base-class with some simple interface apis to standardize trunks as well.

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

kouroshHakha

leaving comments discussed offline

kouroshHakha · 2023-01-26T22:36:37Z

rllib/models/experimental/base.py

+        # If no checking is needed, we can simply return an empty spec.
+        return SpecDict()
+
+    @check_input_specs("input_spec", filter=True, cache=True)


offline: forward -> _forward() update to not expose users to spec checking.

I think it's fair to simply wrap Model.forward in the constructor to circumvent this.
Torch users will attempt to overwrite forward in any case.
I've made it now so that we detect when forward is not wrapped and "autowrap" it in that case.

kouroshHakha · 2023-01-26T22:38:04Z

rllib/models/experimental/base.py

+        raise NotImplementedError
+
+
+class Model:


This class abstracts two things:

spec checking on forward method

unifies forward call between torch and tf. The expectation is that the RLModule / model builder will only work with the assumptions on this api definition.

I've updated it a little. tf.Model and torch.nn now unify that RLModule can simply call them.
Model only defines the minimal input_spec, output_spec and get_initial_state interface. It's pretty shallow now but I think we can leave it here for the moment because models might need other things soon. Possibly a name + a sequence number for richer repr.

kouroshHakha · 2023-01-26T22:41:12Z

rllib/models/experimental/base.py

+
+@ExperimentalAPI
+@dataclass
+class ModelConfig(abc.ABC):


Just one comment, this most likely will end up being a very shallow module that actually adds to the complexity rather than reducing it. Maybe we end up removing it later, once we see more examples of extending this class.

My concern is mostly about the build method. The dataclass itself is fine.

We can make this class totally framework agnostic and only require the caller to pass in the same object to different framework specific model constructors.

As discussed offline yesterday, we'll keep the build method because it abstracts the class to be built. Any model_config infering code will simply return a config that can be built. We'd otherwise have to return a class (which we don't want because it's not framework agnostic) to resolve this issue.

kouroshHakha · 2023-01-26T22:45:17Z

rllib/models/experimental/tf/primitives.py

+    )
+
+
+class TfMLPModel(Model, tf.Module):


should this be tf.Model or tf.keras.Model?

for consistency we should stick to one.

It should be tf.Module. tf.keras.Model is an extention of tf.Module and if we ever run into a situation where we need it's features, we can simply start inheriting from it. But today we don't need it.

https://www.tensorflow.org/api_docs/python/tf/keras/Model
vs
https://www.tensorflow.org/api_docs/python/tf/Module

"A module is a named container for tf.Variables, other tf.Modules and functions which apply to user input"
That's all we want. keras.Model has much richer features and all sorts of stuff that we don't necessarily want to guarantee:

kouroshHakha · 2023-01-26T22:48:42Z

rllib/models/experimental/tf/primitives.py

+    )
+
+
+class TfMLPModel(Model, tf.Module):


This may have been TfModel base class?

Yep, "typo"

kouroshHakha · 2023-01-26T22:48:51Z

rllib/models/experimental/tf/primitives.py

+        raise NotImplementedError
+
+
+class TfMLP(tf.Module):


subclass tfModel?

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

ArturNiederfahrenhorst · 2023-01-27T20:48:34Z

rllib/algorithms/ppo/tests/test_ppo_rl_module.py

 ) -> PPOModuleConfig:
    """Get a PPOModuleConfig that we would expect from the catalog otherwise.

    Args:
        env: Environment for which we build the model later
        lstm: If True, build recurrent pi encoder
-        shared_encoder: If True, build a shared encoder for pi and vf, where pi
-            encoder and vf encoder will be identity. If False, the shared encoder
-            will be identity.


I'll reintroduce this in a upcoming PR with the ActorCriticEncoder

ArturNiederfahrenhorst · 2023-01-27T21:44:56Z

rllib/algorithms/ppo/tests/test_ppo_rl_module.py

    )
-
-
-def get_expected_model_config_tf(


I've unified these into one, since model configs are planned to be framework agnostic.

ArturNiederfahrenhorst · 2023-01-27T21:46:39Z

rllib/algorithms/ppo/tests/test_ppo_rl_module.py

@@ -343,6 +265,9 @@ def test_forward_train(self):
                for param in module.parameters():
                    self.assertIsNotNone(param.grad)
            else:
+                batch = tree.map_structure(
+                    lambda x: tf.convert_to_tensor(x, dtype=tf.float32), batch
+                )


-> because tf does not accept numpy arrays.

ArturNiederfahrenhorst · 2023-01-27T21:47:33Z

rllib/algorithms/ppo/tf/ppo_tf_rl_module.py

        super().__init__()
        self.config = config
        self.setup()

    def setup(self) -> None:
        assert self.config.pi_config, "pi_config must be provided."
        assert self.config.vf_config, "vf_config must be provided."
-        self.shared_encoder = self.config.shared_encoder_config.build()
+        self.encoder = self.config.encoder_config.build(framework="tf")


From here on, encoder will encapsulate the concept of shared/non-shared layers.

ArturNiederfahrenhorst · 2023-01-27T22:08:51Z

rllib/algorithms/ppo/torch/ppo_torch_rl_module.py

+        # Shared encoder
+        encoder_out = self.encoder(batch)
+        if STATE_OUT in encoder_out:
+            output[STATE_OUT] = encoder_out[STATE_OUT]


We'll generally expect a state here in the future and hand that over, even if it's empty.
I'll update this when Encoder are updated in a follow up PR.

ArturNiederfahrenhorst · 2023-01-27T22:12:55Z

rllib/models/experimental/base.py

+        self.config = config
+
+    @abc.abstractmethod
+    def get_initial_state(self):


I think at some point we should change this to simply be a property "initial_state".

ArturNiederfahrenhorst · 2023-01-27T22:16:03Z

rllib/models/experimental/encoder.py

+
+    @check_input_specs("input_spec", cache=True)
+    @check_output_specs("output_spec", cache=True)
+    @abc.abstractmethod


I'm spec checking here even though this method is abstract because Model() this also serves as an example of how forward should look like.

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

ArturNiederfahrenhorst · 2023-01-27T23:24:38Z

@kouroshHakha I've addressed all of your remarks and, afaics, the PR looks clean now. Could you please have a closer look?

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

kouroshHakha

LGTM. I only have one major concern regarding the consistency between tf and torch and also a clean up comment on rl_module repo. There is also a nit :) feel free to ignore it.

kouroshHakha · 2023-01-31T17:11:44Z

rllib/core/rl_module/encoder.py

@@ -40,7 +40,7 @@ def build(self):


 @dataclass
-class FCConfig(EncoderConfig):


Shouldn't we just move the entirety of encoder.pys in rl_module to the experimental folder? To clean up the rl_module folder?

Interesting. I thought I'd moved them in the process of writing the new files that are under .../experimental but they where obviously not deleted.
The two encoder files in the .../rl_module folder are not even in use anymore. Many thanks for realizing that there's something off here 😃

kouroshHakha · 2023-01-31T17:19:16Z

rllib/algorithms/ppo/torch/ppo_torch_rl_module.py

 @dataclass
-class PPOModuleConfig(RLModuleConfig):
-    """Configuration for the PPO module.
+class PPOModuleConfig(RLModuleConfig):  # TODO (Artur): Move to Torch-unspecific file


nit: non-torch-specific or torch-agnostic :)

Changed 🙂

kouroshHakha · 2023-01-31T17:45:44Z

rllib/algorithms/ppo/tf/ppo_tf_rl_module.py

-        encoder_out = self.shared_encoder(obs)
-        action_logits = self.pi(encoder_out)
-        vf = self.vf(encoder_out)
+        encoder_out = self.encoder(batch)


Can you make sure we take care of STATE_OUT here as well? similar to torch? TF and torch should be maximally consistent going fwd.

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

kouroshHakha

@gjoliver Let's merge?

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

ArturNiederfahrenhorst added the do-not-merge Do not merge this PR! label Jan 5, 2023

ArturNiederfahrenhorst assigned kouroshHakha Jan 5, 2023

ArturNiederfahrenhorst requested review from sven1977, gjoliver, avnishn, smorad, maxpumperla, kouroshHakha and krfricke as code owners January 5, 2023 16:38

ArturNiederfahrenhorst mentioned this pull request Jan 5, 2023

[RLlib] Chaining sub-models in RLModules with dynamic spec keys inside forward methods ("Solution 1") #31310

Closed

7 tasks

ArturNiederfahrenhorst changed the title ~~[RLlib] Chaining sub-models in RLModules at configuration time ("Solution 2")~~ [RLlib] Chaining sub-models in RLModules Jan 19, 2023

ArturNiederfahrenhorst changed the title ~~[RLlib] Chaining sub-models in RLModules~~ [RLlib] Chaining Models in RLModules Jan 19, 2023

kouroshHakha reviewed Jan 23, 2023

View reviewed changes

ArturNiederfahrenhorst added 17 commits January 23, 2023 16:40

initial

b1ddd73

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

tests complete

5489c89

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

wip

0dc91e2

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

wip

a9a9498

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

mutually exclusive encoders, tests passing

9930d79

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

add lstm code

109e56d

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

better docs for get expected model config

532c8c0

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

kourosh's comments

31ae2a0

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

lstm fixed, tests working

018e223

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

add state out

b38d501

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

add __main__ to test

462fc4d

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

change lstm testing according to kourosh's comment

f93795a

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

fix get_initial_state

6ae3443

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

remove useless forward_exploration/forward_inference branch

548f42a

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

revert changes to test_ppo_with_rl_module.py

92ae510

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

remove pass

04228db

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

fix gym incompatability

0ab7809

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

add back torch folder

6085e05

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

kouroshHakha reviewed Jan 27, 2023

View reviewed changes

ArturNiederfahrenhorst added 2 commits January 27, 2023 11:53

fix model names and some nits

f46b569

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

renaming and lint

868d442

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

ArturNiederfahrenhorst commented Jan 27, 2023

View reviewed changes

ArturNiederfahrenhorst added 3 commits January 27, 2023 14:23

self-review

8326719

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

output activations

cede51f

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

remove useless init and add comment to torch encoder

8bc5c08

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

remove useless constructor

7846d1b

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

ArturNiederfahrenhorst mentioned this pull request Jan 29, 2023

[RLlib] Incorporate VectorEncoder into PPORLModule and tests #31238

Closed

7 tasks

lint

d2b4aee

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

ArturNiederfahrenhorst removed the do-not-merge Do not merge this PR! label Jan 30, 2023

ArturNiederfahrenhorst mentioned this pull request Jan 30, 2023

[RLlib] Introduce Catalog skeleton for RLModules #32069

Merged

kouroshHakha reviewed Jan 31, 2023

View reviewed changes

ArturNiederfahrenhorst added 2 commits January 31, 2023 10:27

kourohs's nits

166e4eb

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

unify torch + tf

c70224f

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

kouroshHakha approved these changes Feb 7, 2023

View reviewed changes

gjoliver merged commit 027965b into ray-project:master Feb 7, 2023

edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023

[RLlib] Chaining Models in RLModules (ray-project#31469)

29d1cab

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Chaining Models in RLModules #31469

[RLlib] Chaining Models in RLModules #31469

ArturNiederfahrenhorst commented Jan 5, 2023 •

edited

Loading

kouroshHakha left a comment

kouroshHakha left a comment

kouroshHakha Jan 26, 2023

ArturNiederfahrenhorst Jan 27, 2023

kouroshHakha Jan 26, 2023

ArturNiederfahrenhorst Jan 27, 2023

kouroshHakha Jan 26, 2023

kouroshHakha Jan 26, 2023

kouroshHakha Jan 26, 2023

ArturNiederfahrenhorst Jan 27, 2023

kouroshHakha Jan 26, 2023

kouroshHakha Jan 26, 2023

ArturNiederfahrenhorst Jan 27, 2023

kouroshHakha Jan 26, 2023

ArturNiederfahrenhorst Jan 27, 2023

kouroshHakha Jan 26, 2023

ArturNiederfahrenhorst Jan 27, 2023

ArturNiederfahrenhorst Jan 27, 2023

ArturNiederfahrenhorst Jan 27, 2023 •

edited

Loading

ArturNiederfahrenhorst Jan 27, 2023

ArturNiederfahrenhorst Jan 27, 2023

ArturNiederfahrenhorst Jan 27, 2023

ArturNiederfahrenhorst Jan 27, 2023

ArturNiederfahrenhorst commented Jan 27, 2023

kouroshHakha left a comment

kouroshHakha Jan 31, 2023

ArturNiederfahrenhorst Jan 31, 2023

kouroshHakha Jan 31, 2023

ArturNiederfahrenhorst Jan 31, 2023

kouroshHakha Jan 31, 2023

ArturNiederfahrenhorst Feb 1, 2023

kouroshHakha left a comment

		@@ -40,7 +40,7 @@ def build(self):


		@dataclass
		class FCConfig(EncoderConfig):

[RLlib] Chaining Models in RLModules #31469

[RLlib] Chaining Models in RLModules #31469

Conversation

ArturNiederfahrenhorst commented Jan 5, 2023 • edited Loading

Why are these changes needed?

Checks

kouroshHakha left a comment

Choose a reason for hiding this comment

kouroshHakha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst Jan 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst commented Jan 27, 2023

kouroshHakha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kouroshHakha left a comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst commented Jan 5, 2023 •

edited

Loading

ArturNiederfahrenhorst Jan 27, 2023 •

edited

Loading