-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Allow MARLModule customization from algorithm config #32473
[RLlib] Allow MARLModule customization from algorithm config #32473
Conversation
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I start to feel like mixing up policy and rl_module stuff may not be the best idea. makes both complicated.
I hope this is temporary and just to accomplish some intermediate goals.
if module_spec.model_config is None: | ||
module_spec.model_config = self.model | ||
module_spec.model_config = policy_spec.config.get("model", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this block temporary? it's just bridging between the RLModule and Policy worlds right?
if so, can we add a TODO/Note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's temporary until policy co-exists with the RLModule / Learner API. Once we re-write sampler / rollout workers to drop policy, then we won't need this method anymore. instead of creating policy_dicts we will directly create marl_module_specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added TODO/Note.
elif fw == "tf": | ||
assert isinstance(rl_module, DiscreteBCTFModule) | ||
|
||
def test_bc_algorithm_w_custom_marl_module(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just get rid of the test for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna fill out the test since it's relevant to this PR. Basically it tests whether this PR was effective.
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
I know of no better way to make the transition happen. Otherwise I have to change all 100k lines of RLlib code-base together :) |
@@ -3886,7 +3886,7 @@ py_test( | |||
py_test( | |||
name = "examples/rl_trainer/multi_agent_cartpole_ppo_torch_multi_gpu", | |||
main = "examples/rl_trainer/multi_agent_cartpole_ppo.py", | |||
tags = ["team:rllib", "exclusive", "examples", "multi-gpu"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These activated some tests that were silently filtered out.
@@ -334,7 +334,11 @@ def set_weights(self, weights) -> None: | |||
if self.is_local: | |||
self._trainer.set_weights(weights) | |||
else: | |||
self._worker_manager.foreach_actor(lambda w: w.set_weights(weights)) | |||
results_or_errors = self._worker_manager.foreach_actor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is added so that if set_weights() throws an error we catch it. This was surfaced during this PR.
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
…ject#32473) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Why are these changes needed?
The intent is to allow algorithm level customization of RLModules using RLModuleSpecs to allow maximum flexibility in constructing RLModules (including MARLModules with shared encoders).
Think of allowing users to do this:
or
To achieve this, this PR does a couple of things to enable this:
The main changes are in:
rollout_worker.py
algorithm.py
policy.py
algorithm_config.py
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.