-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return deterministic actions #5597
Return deterministic actions #5597
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see CR feedback.
@@ -66,22 +67,31 @@ def __init__( | |||
# During training, clipping is done in TorchPolicy, but we need to clip before ONNX | |||
# export as well. | |||
self._clip_action_on_export = not tanh_squash | |||
self.deterministic = deterministic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless it's going to be used outside of the ActionModel class, refactor to make it
self._deterministic
.
Co-authored-by: Miguel Alonso Jr. <76960110+miguelalonsojr@users.noreply.github.com>
Co-authored-by: Miguel Alonso Jr. <76960110+miguelalonsojr@users.noreply.github.com>
Co-authored-by: Miguel Alonso Jr. <76960110+miguelalonsojr@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! Some nit changes on docs and I believe the setting test for yaml is not set right and I wonder how it's passing CI.
@@ -32,6 +32,7 @@ def __init__( | |||
action_spec: ActionSpec, | |||
conditional_sigma: bool = False, | |||
tanh_squash: bool = False, | |||
deterministic: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update the docstring
@@ -541,6 +542,7 @@ def test_default_settings(): | |||
test1_settings = run_options.behaviors["test1"] | |||
assert test1_settings.max_steps == 2 | |||
assert test1_settings.network_settings.hidden_units == 2000 | |||
assert not test1_settings.network_settings.deterministic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use == True just for readability
agent_action1 = action_model._sample_action(dists) | ||
agent_action2 = action_model._sample_action(dists) | ||
agent_action3 = action_model._sample_action(dists) | ||
assert torch.equal(agent_action1.continuous_tensor, agent_action2.continuous_tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some tests on discrete actions would be great!
@@ -541,6 +542,7 @@ def test_default_settings(): | |||
test1_settings = run_options.behaviors["test1"] | |||
assert test1_settings.max_steps == 2 | |||
assert test1_settings.network_settings.hidden_units == 2000 | |||
assert not test1_settings.network_settings.deterministic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this test is wrong. Above we set deterministic: true
in yaml, so it should be assert test1_settings.network_settings.deterministic == True
, right
?
Co-authored-by: Maryam Honari <honari.m94@gmail.com>
Co-authored-by: Maryam Honari <honari.m94@gmail.com>
Co-authored-by: Maryam Honari <honari.m94@gmail.com>
…y-Technologies/ml-agents into develop-staging-determinstic-action
Co-authored-by: Maryam Honari <honari.m94@gmail.com>
Proposed change(s)
This PR will add an ability to retrieve actions deterministically based on the input to the model. A new run-options configuration has been added as well as a new CLI flag
--deterministic
.Types of change(s)
Checklist
Other comments