Track histogram of environment reward #4878

awjuliani · 2021-01-21T23:58:37Z

Proposed change(s)

Tracks the episodic reward as a histogram in addition to a scalar in tensorboard. This allows the multi-modality of rewards to be more apparent to end-users.

Below is an example of such a graph generated from a PushBlock training run.

Here is the learning curve from FoodCollector:

Not sure that my current solution is the most optimal given how the StatsSummary and StatsWriters are current designed. Open to feedback.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

ml-agents/mlagents/trainers/stats.py

chriselion · 2021-02-01T19:33:35Z

ml-agents/mlagents/trainers/stats.py

+
+    @property
+    def num(self):
+        return len(self.full_dist)


StatsSummary.empty().num will return 1 now.

My first thought is to use an empty list for full_dist when calling empty(). That means the other stats will return NaN which is technically more correct than returning 0, thought may break something else. Do you have a different preference?

Only other thought is to do e.g. np.mean(self.full_dist) if self.full_dist) else 0.0 which is basically the old behavior, and same for np.std (np.sum is safe)

But I think this is fine too.

chriselion · 2021-02-01T19:45:07Z

Can you make it so that user stats can use this too? I think it's pretty small, you'd just need to

Add a new StatsAggregationMethod enum values (say, HISTOGRAM) on python and C#
Make sure Environment/Cumulative Reward uses the new value
Check StatsSummary.aggregation_method in TensorBoard writer instead of hard-coding the key

awjuliani · 2021-02-01T21:51:29Z

Can you make it so that user stats can use this too? I think it's pretty small, you'd just need to

Add a new StatsAggregationMethod enum values (say, HISTOGRAM) on python and C#

Make sure Environment/Cumulative Reward uses the new value

Check StatsSummary.aggregation_method in TensorBoard writer instead of hard-coding the key

Made these changes.

ervteng

Nice, this is actually super powerful for other things besides the reward too 🚢 🇮🇹

Track hist of environment reward

bac34d5

chriselion reviewed Jan 22, 2021

View reviewed changes

ml-agents/mlagents/trainers/stats.py Outdated Show resolved Hide resolved

Fix tests

09d6f76

awjuliani requested a review from andrewcoh January 22, 2021 18:54

awjuliani marked this pull request as ready for review January 22, 2021 18:54

awjuliani changed the title ~~[WIP] Track histogram of environment reward~~ Track histogram of environment reward Jan 22, 2021

Add change to changelog

91ef867

ervteng reviewed Jan 28, 2021

View reviewed changes

ml-agents/mlagents/trainers/stats.py Show resolved Hide resolved

Change StatsSummary to use properties

fd8be0f

chriselion reviewed Feb 1, 2021

View reviewed changes

awjuliani added 2 commits February 1, 2021 12:34

Make list empty when making empty summary

26cd0be

Add histogram aggregation type

521da20

chriselion approved these changes Feb 1, 2021

View reviewed changes

Merge branch 'master' into reward-dist

94283d1

ervteng approved these changes Feb 2, 2021

View reviewed changes

Fix broken syntax

0fba02d

awjuliani merged commit 1098600 into master Feb 3, 2021

delete-merged-branch bot deleted the reward-dist branch February 3, 2021 21:53

github-actions bot locked as resolved and limited conversation to collaborators Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track histogram of environment reward #4878

Track histogram of environment reward #4878

awjuliani commented Jan 21, 2021 •

edited

Loading

chriselion Feb 1, 2021

awjuliani Feb 1, 2021

chriselion Feb 1, 2021

chriselion Feb 1, 2021

chriselion commented Feb 1, 2021

awjuliani commented Feb 1, 2021

ervteng left a comment •

edited

Loading

Track histogram of environment reward #4878

Track histogram of environment reward #4878

Conversation

awjuliani commented Jan 21, 2021 • edited Loading

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

chriselion Feb 1, 2021

Choose a reason for hiding this comment

awjuliani Feb 1, 2021

Choose a reason for hiding this comment

chriselion Feb 1, 2021

Choose a reason for hiding this comment

chriselion Feb 1, 2021

Choose a reason for hiding this comment

chriselion commented Feb 1, 2021

awjuliani commented Feb 1, 2021

ervteng left a comment • edited Loading

Choose a reason for hiding this comment

awjuliani commented Jan 21, 2021 •

edited

Loading

ervteng left a comment •

edited

Loading