Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add networksettings to reward providers #4982

Merged
merged 9 commits into from
Feb 22, 2021
Merged

Conversation

andrewcoh
Copy link
Contributor

Proposed change(s)

Describe the changes made in this PR.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@andrewcoh andrewcoh changed the base branch from master to fix-gail February 19, 2021 20:01
@andrewcoh andrewcoh requested review from ervteng and vincentpierre and removed request for ervteng February 19, 2021 20:01
@@ -199,28 +199,37 @@ def structure(d: Mapping, t: type) -> Any:
enum_key = RewardSignalType(key)
t = enum_key.to_settings()
d_final[enum_key] = strict_to_cls(val, t)
if "encoding_size" in val:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backward compatible with old configs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you ad a comment around this code so we will remember it ?

@ervteng
Copy link
Contributor

ervteng commented Feb 19, 2021

cc @hvpeteet, @sini - this will introduce a (backwards-compatible) change for the YAMLs - just checking to make sure it won't cause any issues.

@andrewcoh andrewcoh requested a review from hvpeteet February 19, 2021 20:16
@andrewcoh andrewcoh mentioned this pull request Feb 19, 2021
10 tasks
Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to edit the documentation before merging.

@@ -183,7 +183,7 @@ def to_settings(self) -> type:
class RewardSignalSettings:
gamma: float = 0.99
strength: float = 1.0
normalize: bool = False
network_settings: NetworkSettings = attr.ib(factory=NetworkSettings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this one optional and if it is None, then use the Policy's network settings rather than our own defaults. How does that sound ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to use our defaults since it's possible the policy has significantly more capacity than is needed i.e. the Crawler policy of 3/512 vs what we use for the discriminator 2/128. That being said, I also realize this enables users to specify memory which we probably want to explicitly prevent in the reward providers. cc @ervteng

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not opposed to either route, they have their own pros/cons. Either way as long as it's documented it should be fine.
Is getting the Policy settings super ugly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure how future proof it is for multi-agent scenarios. We could have different policies to select from. Additionally, we currently create reward signals in the optimizer/torch_optimizer.py and in the future i think it will be necessary to remove the policy from the optimizer (also for multiagent) in which case this would need to be addressed by either keeping the policy around/moving the creation of the reward provider. My vote is for default network settings

@@ -199,28 +199,37 @@ def structure(d: Mapping, t: type) -> Any:
enum_key = RewardSignalType(key)
t = enum_key.to_settings()
d_final[enum_key] = strict_to_cls(val, t)
if "encoding_size" in val:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you ad a comment around this code so we will remember it ?

Copy link
Contributor

@hvpeteet hvpeteet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, we don't use these fields for anything cloud specific.

@andrewcoh andrewcoh merged commit 5fcbbc4 into fix-gail Feb 22, 2021
@delete-merged-branch delete-merged-branch bot deleted the fix-gail-networksettings branch February 22, 2021 21:21
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants