Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Allow MARLModule customization from algorithm config #32473

Merged
merged 38 commits into from
Feb 14, 2023

Conversation

kouroshHakha
Copy link
Contributor

@kouroshHakha kouroshHakha commented Feb 12, 2023

Why are these changes needed?

The intent is to allow algorithm level customization of RLModules using RLModuleSpecs to allow maximum flexibility in constructing RLModules (including MARLModules with shared encoders).

Think of allowing users to do this:

class MyCustomRLModuleSpec(SingleRLModuleSpec):
       def build():
              # custom build method

config = config.rl_module(rl_module_spec=MyCustomRLModuleSpec())
algo = config.build()

or

class MyCustomMARLModuleSpec(MultiAgentRLModuleSpec):
       def build():
              shared_encoder = ...
              # custom

config = config.rl_module(rl_module_spec=MyCustomMARLModuleSpec())
algo = config.build()

To achieve this, this PR does a couple of things to enable this:

  1. It adds policy_ids to the policy objects, so that they can construct the whole MARLModule but then index the relevant module name in the variable scope of policies and assign the local module to self.model. The reason for this is that we want to allow people to pass in a complicated MARLModule with possible shared encoders, that only are instantiatable via MARLModuleSpecs.
  2. The creation of RLModuleSpecs is moved into rollout_workers since there is quite some logic for multi-agent policy_dict and policy map construction inside rollout_worker that allows us to construct MARLModuleSpecs accordingly. If I wanted to write everything from scratch, these would have been created at algorithm level and then passed to all actors (rollout_workers or trainerS), but rollout_worker already does all of that (and also has to work with policies), and for now we leverage it by creating a marl_module_spec inside local_worker and passing a reference in the Algorithm class to pass it to trainer_runner.
  3. build_policy_map did a lot of things beside constructing the policy_map. I broke the method into multiple modular stages that should be identical to the original one. The only difference is that when RLModule API is enabled, the marl_module_spec will get constructed by intercepting the policy_specs during build_policy_map operation.

The main changes are in:

rollout_worker.py
algorithm.py
policy.py
algorithm_config.py

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I start to feel like mixing up policy and rl_module stuff may not be the best idea. makes both complicated.
I hope this is temporary and just to accomplish some intermediate goals.

if module_spec.model_config is None:
module_spec.model_config = self.model
module_spec.model_config = policy_spec.config.get("model", {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this block temporary? it's just bridging between the RLModule and Policy worlds right?
if so, can we add a TODO/Note?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's temporary until policy co-exists with the RLModule / Learner API. Once we re-write sampler / rollout workers to drop policy, then we won't need this method anymore. instead of creating policy_dicts we will directly create marl_module_specs.

Copy link
Contributor Author

@kouroshHakha kouroshHakha Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added TODO/Note.

elif fw == "tf":
assert isinstance(rl_module, DiscreteBCTFModule)

def test_bc_algorithm_w_custom_marl_module(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just get rid of the test for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna fill out the test since it's relevant to this PR. Basically it tests whether this PR was effective.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha
Copy link
Contributor Author

I start to feel like mixing up policy and rl_module stuff may not be the best idea. makes both complicated. I hope this is temporary and just to accomplish some intermediate goals.

I know of no better way to make the transition happen. Otherwise I have to change all 100k lines of RLlib code-base together :)

@@ -3886,7 +3886,7 @@ py_test(
py_test(
name = "examples/rl_trainer/multi_agent_cartpole_ppo_torch_multi_gpu",
main = "examples/rl_trainer/multi_agent_cartpole_ppo.py",
tags = ["team:rllib", "exclusive", "examples", "multi-gpu"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These activated some tests that were silently filtered out.

@@ -334,7 +334,11 @@ def set_weights(self, weights) -> None:
if self.is_local:
self._trainer.set_weights(weights)
else:
self._worker_manager.foreach_actor(lambda w: w.set_weights(weights))
results_or_errors = self._worker_manager.foreach_actor(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is added so that if set_weights() throws an error we catch it. This was surfaced during this PR.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha kouroshHakha added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Feb 14, 2023
@gjoliver gjoliver merged commit a447cbb into ray-project:master Feb 14, 2023
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…ject#32473)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants