Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track histogram of environment reward #4878

Merged
merged 8 commits into from
Feb 3, 2021
Merged

Track histogram of environment reward #4878

merged 8 commits into from
Feb 3, 2021

Conversation

awjuliani
Copy link
Contributor

@awjuliani awjuliani commented Jan 21, 2021

Proposed change(s)

Tracks the episodic reward as a histogram in addition to a scalar in tensorboard. This allows the multi-modality of rewards to be more apparent to end-users.

Below is an example of such a graph generated from a PushBlock training run.

Screen Shot 2021-01-21 at 3 44 38 PM

Here is the learning curve from FoodCollector:

Screen Shot 2021-01-21 at 4 11 42 PM

Not sure that my current solution is the most optimal given how the StatsSummary and StatsWriters are current designed. Open to feedback.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@awjuliani awjuliani marked this pull request as ready for review January 22, 2021 18:54
@awjuliani awjuliani changed the title [WIP] Track histogram of environment reward Track histogram of environment reward Jan 22, 2021

@property
def num(self):
return len(self.full_dist)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StatsSummary.empty().num will return 1 now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought is to use an empty list for full_dist when calling empty(). That means the other stats will return NaN which is technically more correct than returning 0, thought may break something else. Do you have a different preference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only other thought is to do e.g. np.mean(self.full_dist) if self.full_dist) else 0.0 which is basically the old behavior, and same for np.std (np.sum is safe)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think this is fine too.

@chriselion
Copy link
Contributor

Can you make it so that user stats can use this too? I think it's pretty small, you'd just need to

  • Add a new StatsAggregationMethod enum values (say, HISTOGRAM) on python and C#
  • Make sure Environment/Cumulative Reward uses the new value
  • Check StatsSummary.aggregation_method in TensorBoard writer instead of hard-coding the key

@awjuliani
Copy link
Contributor Author

Can you make it so that user stats can use this too? I think it's pretty small, you'd just need to

  • Add a new StatsAggregationMethod enum values (say, HISTOGRAM) on python and C#
  • Make sure Environment/Cumulative Reward uses the new value
  • Check StatsSummary.aggregation_method in TensorBoard writer instead of hard-coding the key

Made these changes.

Copy link
Contributor

@ervteng ervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is actually super powerful for other things besides the reward too 🚢 🇮🇹

@awjuliani awjuliani merged commit 1098600 into master Feb 3, 2021
@delete-merged-branch delete-merged-branch bot deleted the reward-dist branch February 3, 2021 21:53
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants