-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Cleanup examples folder 23: Curiosity (inverse dynamics model based) RLModule example. #46841
[RLlib] Cleanup examples folder 23: Curiosity (inverse dynamics model based) RLModule example. #46841
Changes from all commits
32faa1b
31e6230
5ae97ed
f3348fc
00f5085
4b64182
ede1744
d143ea2
471f726
0acf005
f99182a
a7d8338
18c48c9
82ff703
09a1c95
7a74894
6333405
5afe5cd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -59,6 +59,9 @@ class Columns: | |
ADVANTAGES = "advantages" | ||
VALUE_TARGETS = "value_targets" | ||
|
||
# Intrinsic rewards (learning with curiosity). | ||
INTRINSIC_REWARDS = "intrinsic_rewards" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! Having this makes things less ugly :) |
||
|
||
# Loss mask. If provided in a train batch, a Learner's compute_loss_for_module | ||
# method should respect the False-set value in here and mask out the respective | ||
# items form the loss. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -829,11 +829,8 @@ def should_module_be_updated(self, module_id, multi_agent_batch=None): | |
|
||
@OverrideToImplementCustomLogic | ||
def compute_loss( | ||
self, | ||
*, | ||
fwd_out: Dict[str, Any], | ||
batch: Dict[str, Any], | ||
) -> Union[TensorType, Dict[str, Any]]: | ||
self, *, fwd_out: Dict[str, Any], batch: Dict[str, Any] | ||
) -> Dict[str, Any]: | ||
"""Computes the loss for the module being optimized. | ||
|
||
This method must be overridden by multiagent-specific algorithm learners to | ||
|
@@ -886,7 +883,7 @@ def compute_loss_for_module( | |
self, | ||
*, | ||
module_id: ModuleID, | ||
config: Optional["AlgorithmConfig"] = None, | ||
config: "AlgorithmConfig", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we now always need to provide a config? I think for most algorithms this is not needed because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I felt like this is the better solution for users. 2 reasons:
In other words, if we had left this arg to be optional, every user writing a custom loss function would have had to implement a (not too known) logic on how to get the module's individual config. |
||
batch: Dict[str, Any], | ||
fwd_out: Dict[str, TensorType], | ||
) -> TensorType: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -163,13 +163,6 @@ def restore_from_path(self, *args, **kwargs): | |
def get_metadata(self, *args, **kwargs): | ||
self.unwrapped().get_metadata(*args, **kwargs) | ||
|
||
# TODO (sven): Figure out a better way to avoid having to method-spam this wrapper | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah nice. This was still there. |
||
# class, whenever we add a new API to any wrapped RLModule here. We could try | ||
# auto generating the wrapper methods, but this will bring its own challenge | ||
# (e.g. recursive calls due to __getattr__ checks, etc..). | ||
def _compute_values(self, *args, **kwargs): | ||
return self.unwrapped()._compute_values(*args, **kwargs) | ||
|
||
@override(RLModule) | ||
def unwrapped(self) -> "RLModule": | ||
return self.module | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this PR, but later we might want to have also a
remove
method for the connector pipeline. We can of course always overridebuild_learner_pipeline
in the config, but that means to define the complete pipeline instead of single parts that need to be removed/replaced.