[RLlib] Allow MARLModule customization from algorithm config #32473

kouroshHakha · 2023-02-12T02:17:48Z

Why are these changes needed?

The intent is to allow algorithm level customization of RLModules using RLModuleSpecs to allow maximum flexibility in constructing RLModules (including MARLModules with shared encoders).

Think of allowing users to do this:

class MyCustomRLModuleSpec(SingleRLModuleSpec):
       def build():
              # custom build method

config = config.rl_module(rl_module_spec=MyCustomRLModuleSpec())
algo = config.build()

or

class MyCustomMARLModuleSpec(MultiAgentRLModuleSpec):
       def build():
              shared_encoder = ...
              # custom

config = config.rl_module(rl_module_spec=MyCustomMARLModuleSpec())
algo = config.build()

To achieve this, this PR does a couple of things to enable this:

It adds policy_ids to the policy objects, so that they can construct the whole MARLModule but then index the relevant module name in the variable scope of policies and assign the local module to self.model. The reason for this is that we want to allow people to pass in a complicated MARLModule with possible shared encoders, that only are instantiatable via MARLModuleSpecs.
The creation of RLModuleSpecs is moved into rollout_workers since there is quite some logic for multi-agent policy_dict and policy map construction inside rollout_worker that allows us to construct MARLModuleSpecs accordingly. If I wanted to write everything from scratch, these would have been created at algorithm level and then passed to all actors (rollout_workers or trainerS), but rollout_worker already does all of that (and also has to work with policies), and for now we leverage it by creating a marl_module_spec inside local_worker and passing a reference in the Algorithm class to pass it to trainer_runner.
build_policy_map did a lot of things beside constructing the policy_map. I broke the method into multiple modular stages that should be identical to the original one. The only difference is that when RLModule API is enabled, the marl_module_spec will get constructed by intercepting the policy_specs during build_policy_map operation.

The main changes are in:

rollout_worker.py
algorithm.py
policy.py
algorithm_config.py

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gjoliver

I start to feel like mixing up policy and rl_module stuff may not be the best idea. makes both complicated.
I hope this is temporary and just to accomplish some intermediate goals.

gjoliver · 2023-02-13T18:09:43Z

rllib/algorithms/algorithm_config.py

            if module_spec.model_config is None:
-                module_spec.model_config = self.model
+                module_spec.model_config = policy_spec.config.get("model", {})


is this block temporary? it's just bridging between the RLModule and Policy worlds right?
if so, can we add a TODO/Note?

Yes, it's temporary until policy co-exists with the RLModule / Learner API. Once we re-write sampler / rollout workers to drop policy, then we won't need this method anymore. instead of creating policy_dicts we will directly create marl_module_specs.

added TODO/Note.

gjoliver · 2023-02-13T18:37:14Z

rllib/core/testing/tests/test_bc_algorithm.py

+            elif fw == "tf":
+                assert isinstance(rl_module, DiscreteBCTFModule)
+
+    def test_bc_algorithm_w_custom_marl_module(self):


just get rid of the test for now?

I'm gonna fill out the test since it's relevant to this PR. Basically it tests whether this PR was effective.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha · 2023-02-13T22:29:42Z

I start to feel like mixing up policy and rl_module stuff may not be the best idea. makes both complicated. I hope this is temporary and just to accomplish some intermediate goals.

I know of no better way to make the transition happen. Otherwise I have to change all 100k lines of RLlib code-base together :)

kouroshHakha · 2023-02-13T22:31:07Z

rllib/BUILD

@@ -3886,7 +3886,7 @@ py_test(
 py_test(
    name = "examples/rl_trainer/multi_agent_cartpole_ppo_torch_multi_gpu",
    main = "examples/rl_trainer/multi_agent_cartpole_ppo.py",
-    tags = ["team:rllib", "exclusive", "examples", "multi-gpu"],


These activated some tests that were silently filtered out.

kouroshHakha · 2023-02-13T22:43:03Z

rllib/core/rl_trainer/trainer_runner.py

@@ -334,7 +334,11 @@ def set_weights(self, weights) -> None:
        if self.is_local:
            self._trainer.set_weights(weights)
        else:
-            self._worker_manager.foreach_actor(lambda w: w.set_weights(weights))
+            results_or_errors = self._worker_manager.foreach_actor(


This is added so that if set_weights() throws an error we catch it. This was surfaced during this PR.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…ject#32473) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

kouroshHakha added 14 commits February 10, 2023 14:08

wip

e7425b8

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge branch 'master' into use_rl_module_specs

a5d3d00

added an example bc module for multi-agent that shares encoder

fb4dce0

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

lint

6c80a5b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

e38980c

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

59a3356

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

BUILD

e7037ce

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

8d34519

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge branch 'master' into bc-trainer-for-test

2f36a62

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

lint

04fec76

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

updated BUILD

9ec1ef8

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

hidden_dims -> fcnet_hiddens

fce2924

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

09867da

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

b67fd66

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha requested review from sven1977, gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla and krfricke as code owners February 12, 2023 02:17

kouroshHakha added 3 commits February 11, 2023 18:22

wip

20969ba

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

334450a

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge branch 'bc-trainer-for-test' into customize-specs-from-algo

8536884

kouroshHakha assigned gjoliver Feb 12, 2023

kouroshHakha added 5 commits February 11, 2023 23:29

wip

446a532

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

662b41a

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

25ba7f3

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

d8b7789

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

ad4568b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha added 8 commits February 12, 2023 10:04

wrote the design PRD

2b41f7c

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

e75fbb9

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

05187e7

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

52abda7

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

94b1402

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

77117d1

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

0377181

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

468bab9

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha assigned avnishn Feb 13, 2023

kouroshHakha added 2 commits February 13, 2023 10:26

wip

738a500

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

ff3fd86

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gjoliver approved these changes Feb 13, 2023

View reviewed changes

kouroshHakha added 4 commits February 13, 2023 10:54

wip

08e9580

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge branch 'master' into customize-specs-from-algo

07a3f29

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

5d78f0d

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

5e93c03

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha commented Feb 13, 2023

View reviewed changes

kouroshHakha added 2 commits February 13, 2023 14:48

documentation updated

7e2c324

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

lint

f63fce7

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

richardliaw approved these changes Feb 14, 2023

View reviewed changes

kouroshHakha added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Feb 14, 2023

gjoliver merged commit a447cbb into ray-project:master Feb 14, 2023

edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023

[RLlib] Allow MARLModule customization from algorithm config (ray-pro…

a9d74a4

…ject#32473) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Allow MARLModule customization from algorithm config #32473

[RLlib] Allow MARLModule customization from algorithm config #32473

kouroshHakha commented Feb 12, 2023 •

edited

Loading

gjoliver left a comment

gjoliver Feb 13, 2023

kouroshHakha Feb 13, 2023

kouroshHakha Feb 13, 2023 •

edited

Loading

gjoliver Feb 13, 2023

kouroshHakha Feb 13, 2023

kouroshHakha commented Feb 13, 2023

kouroshHakha Feb 13, 2023

kouroshHakha Feb 13, 2023

[RLlib] Allow MARLModule customization from algorithm config #32473

[RLlib] Allow MARLModule customization from algorithm config #32473

Conversation

kouroshHakha commented Feb 12, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

gjoliver left a comment

Choose a reason for hiding this comment

gjoliver Feb 13, 2023

Choose a reason for hiding this comment

kouroshHakha Feb 13, 2023

Choose a reason for hiding this comment

kouroshHakha Feb 13, 2023 • edited Loading

Choose a reason for hiding this comment

gjoliver Feb 13, 2023

Choose a reason for hiding this comment

kouroshHakha Feb 13, 2023

Choose a reason for hiding this comment

kouroshHakha commented Feb 13, 2023

kouroshHakha Feb 13, 2023

Choose a reason for hiding this comment

kouroshHakha Feb 13, 2023

Choose a reason for hiding this comment

kouroshHakha commented Feb 12, 2023 •

edited

Loading

kouroshHakha Feb 13, 2023 •

edited

Loading