[RLlib] Cleanup examples folder 23: Curiosity (inverse dynamics model based) RLModule example. #46841

sven1977 · 2024-07-29T13:45:40Z

Cleanup examples folder 23: Curiosity (inverse dynamics model based) RLModule example.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…nup_examples_folder_22_count_based_curiosity

…cleanup_examples_folder_23_curiosity_rl_module_example

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…nup_examples_folder_23_curiosity_rl_module_example

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…nup_examples_folder_23_curiosity_rl_module_example

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…nup_examples_folder_23_curiosity_rl_module_example

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. Awesome example. Added some comments where I had questions. Furthermore, if we decide to provide exploration again as a feature we might need wrappers for learners and modules.

simonsays1980 · 2024-07-30T09:44:45Z

rllib/algorithms/dqn/dqn_rainbow_learner.py

-        # Prepend a NEXT_OBS from episodes to train batch connector piece (right
-        # after the observation default piece).
+        # Prepend the "add-NEXT_OBS-from-episodes-to-train-batch" connector piece (right
+        # after the corresponding "add-OBS-..." default piece).


Not in this PR, but later we might want to have also a remove method for the connector pipeline. We can of course always override build_learner_pipeline in the config, but that means to define the complete pipeline instead of single parts that need to be removed/replaced.

simonsays1980 · 2024-07-30T09:46:58Z

rllib/core/columns.py

@@ -59,6 +59,9 @@ class Columns:
    ADVANTAGES = "advantages"
    VALUE_TARGETS = "value_targets"

+    # Intrinsic rewards (learning with curiosity).
+    INTRINSIC_REWARDS = "intrinsic_rewards"


Nice! Having this makes things less ugly :)

simonsays1980 · 2024-07-30T10:01:53Z

rllib/core/learner/learner.py

@@ -886,7 +883,7 @@ def compute_loss_for_module(
        self,
        *,
        module_id: ModuleID,
-        config: Optional["AlgorithmConfig"] = None,
+        config: "AlgorithmConfig",


Do we now always need to provide a config? I think for most algorithms this is not needed because self.config should be available.

I felt like this is the better solution for users. 2 reasons:

Users normally override compute_loss_for_module, so now they do NOT have to implement a logic, where config is None.

Users do NOT normally override the more top-level compute_loss, so we can easily provide each module's individual config through our base implementations.

In other words, if we had left this arg to be optional, every user writing a custom loss function would have had to implement a (not too known) logic on how to get the module's individual config.

simonsays1980 · 2024-07-30T10:03:25Z

rllib/core/rl_module/torch/torch_rl_module.py

@@ -163,13 +163,6 @@ def restore_from_path(self, *args, **kwargs):
    def get_metadata(self, *args, **kwargs):
        self.unwrapped().get_metadata(*args, **kwargs)

-    # TODO (sven): Figure out a better way to avoid having to method-spam this wrapper


Ah nice. This was still there.

simonsays1980 · 2024-07-30T10:12:00Z

rllib/examples/curiosity/inverse_dynamics_model_based_curiosity.py

+            },
+        )
+        # Use our custom `curiosity` method to set up the ICM and our PPO/ICM-Learner.
+        .curiosity(


simonsays1980 · 2024-07-30T10:25:51Z