Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] New API stack: (Multi)RLModule overhaul vol 04 (deprecate RLModuleConfig; cleanups, DefaultModelConfig dataclass). #47908

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Oct 4, 2024

New API stack: (Multi)RLModule overhaul vol 04:

  • deprecate RLModuleConfig and MultiRLModuleConfig in favor of using only specs (individual c'tor args) to construct/build RLModules
  • Cleaned up some attribute naming for better clarity, e.g. MultiRLModule.module_specs -> rl_module_specs.
  • Introduced DefaultModelConfig class to replace the old stack's MODEL_CONFIG dict. This dataclass will be used when training with RLlib's default models. When training with any custom RLModule, users should simply pass in a dict with arbitrary key/value pairs into the RLModuleSpec.
  • Renamed RLModule.model_config_dict into simply model_config. (<- now we can see, why it's important to deprecate RLModuleConfig as it is very confusing to have 2 different config classes/attributes around).

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
…odule_do_over_bc_default_module_04_refactor_rl_module_and_multi_rl_module
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…odule_do_over_bc_default_module_04_refactor_rl_module_and_multi_rl_module

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	rllib/algorithms/ppo/tests/test_ppo_rl_module.py
#	rllib/algorithms/ppo/torch/ppo_torch_rl_module.py
#	rllib/core/rl_module/multi_rl_module.py
#	rllib/core/rl_module/rl_module.py
#	rllib/tuned_examples/dqn/multi_agent_cartpole_dqn.py
…odule_do_over_bc_default_module_04_refactor_rl_module_and_multi_rl_module

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	rllib/algorithms/ppo/tests/test_ppo_rl_module.py
#	rllib/algorithms/ppo/torch/ppo_torch_rl_module.py
#	rllib/core/rl_module/multi_rl_module.py
#	rllib/core/rl_module/rl_module.py
#	rllib/tuned_examples/dqn/multi_agent_cartpole_dqn.py
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 requested review from maxpumperla and a team as code owners October 6, 2024 20:53
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) October 7, 2024 08:48
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 7, 2024
@@ -54,18 +54,18 @@
from ray.rllib.core.testing.torch.bc_module import DiscreteBCTorchModule

spec = MultiRLModuleSpec(
module_specs={
rl_module_specs={
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed: rl_module_specs for more clarity

@@ -40,7 +40,7 @@
module_class=DiscreteBCTorchModule,
observation_space=env.observation_space,
action_space=env.action_space,
model_config_dict={"fcnet_hiddens": [64]},
model_config={"fcnet_hiddens": [64]},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed: model_config for more clarity (now that the redundant (Multi)RLModuleConfig are gone).

input_dim = self.config.observation_space.shape[0]
hidden_dim = self.config.model_config_dict["fcnet_hiddens"][0]
output_dim = self.config.action_space.n
input_dim = self.observation_space.shape[0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: the old notation: self.config.... still works in case users have old RLModule classes that use this attribute, but it is no longer advertized.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should set a date until users should switch to be able to deprecate cleanly.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
@github-actions github-actions bot disabled auto-merge October 7, 2024 11:37
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some general questions and some nit here and there.

hidden_dim = self.config.model_config_dict["fcnet_hiddens"][0]
output_dim = self.config.action_space.n
input_dim = self.observation_space.shape[0]
hidden_dim = self.model_config["fcnet_hiddens"][0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice!

configure the `RLModule` in the new stack and the `ModelV2` in the old
stack.

Returns:
A dictionary with the model configuration.
"""
return self._model_config_auto_includes | self._model_config_dict
return self._model_config_auto_includes | (
self._model_config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically self._model_config could be empty, couldn't it? This case was covered before by the MODEL_DEFAULTS from self._model_config_auto_includes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct! self._model_config could be empty, BUT then the user would have to catch such empty dicts in their own custom RLModule class' setup method. If the user uses a default RLModule from one of the algos, they have to provide the DefaultModelConfig dataclass, which is always fully defined.

I'm 100% with you that we have to come up with a unified solution for all our configs (across RLlib, but even better across Ray itself). Right now, it's a mix of dicts, dataclasses, and custom config classes (e.g. AlgorithmConfig).

@@ -26,8 +26,8 @@ class BCCatalog(Catalog):
Any custom head can be built by overriding the `build_pi_head()` method.
Alternatively, the `PiHeadConfig` can be overridden to build a custom
policy head during runtime. To change solely the network architecture,
`model_config_dict["post_fcnet_hiddens"]` and
`model_config_dict["post_fcnet_activation"]` can be used.
`model_config_dict["head_fcnet_hiddens"]` and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like this naming convention. This makes it much clearer where to pass in which configs.

@@ -13,54 +12,41 @@
torch, nn = try_import_torch()


class MARWILTorchRLModule(TorchRLModule, MARWILRLModule):
framework: str = "torch"
class MARWILTorchRLModule(TorchRLModule):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can simply derive from PPOTorchRLModule, can't we. Should be all the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, removed MARWILTorchRLModule entirely (similar to IMPALA using PPOTorchRLModule).

input_dim = self.config.observation_space.shape[0]
hidden_dim = self.config.model_config_dict["fcnet_hiddens"][0]
output_dim = self.config.action_space.n
input_dim = self.observation_space.shape[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should set a date until users should switch to be able to deprecate cleanly.

hidden_layer_activation=self.config.model_config_dict[
"post_fcnet_activation"
],
hidden_layer_dims=self.model_config["post_fcnet_hiddens"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

hidden_layer_activation=self.config.model_config_dict[
"post_fcnet_activation"
],
hidden_layer_dims=self.model_config["post_fcnet_hiddens"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -50,268 +50,69 @@
# fmt: off
# __sphinx_doc_begin__
MODEL_DEFAULTS: ModelConfigDict = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultModelConfig.to_dict()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it as-is (will be gone, soon, anyways). Some names/keys have changed, so this would be a nightmare.

@@ -69,7 +69,16 @@

# Represents the model config sub-dict of the algo config that is passed to
# the model catalog.
ModelConfigDict = dict
ModelConfigDict = dict # @OldAPIStack
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it. Here it is defined ...

# Each inner list has the format: [num_output_filters, kernel, stride], where kernel
# and stride may be single ints (width and height are the same) or 2-tuples (int, int)
# for width and height (different values).
ConvFilterSpec = List[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Can be tuple.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…odule_do_over_bc_default_module_04_refactor_rl_module_and_multi_rl_module
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) October 9, 2024 15:00
@sven1977 sven1977 merged commit a62d4bf into ray-project:master Oct 9, 2024
6 checks passed
@sven1977 sven1977 deleted the rl_module_do_over_bc_default_module_04_refactor_rl_module_and_multi_rl_module branch October 9, 2024 17:31
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…oduleConfig; cleanups, DefaultModelConfig dataclass). (ray-project#47908)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…oduleConfig; cleanups, DefaultModelConfig dataclass). (ray-project#47908)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…oduleConfig; cleanups, DefaultModelConfig dataclass). (ray-project#47908)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…oduleConfig; cleanups, DefaultModelConfig dataclass). (ray-project#47908)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…oduleConfig; cleanups, DefaultModelConfig dataclass). (ray-project#47908)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…oduleConfig; cleanups, DefaultModelConfig dataclass). (ray-project#47908)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…oduleConfig; cleanups, DefaultModelConfig dataclass). (ray-project#47908)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…oduleConfig; cleanups, DefaultModelConfig dataclass). (ray-project#47908)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…oduleConfig; cleanups, DefaultModelConfig dataclass). (ray-project#47908)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
# Experimental flag.
# If True, user specified no preprocessor to be created
# (via config._disable_preprocessor_api=True). If True, observations
# will arrive in model as they are returned by the env.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I @sven1977, I know that this pull request has been merged and that this is the old API, but why have these comments (from this key and below) been removed? Thank you in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants