-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track histogram of environment reward #4878
Conversation
|
||
@property | ||
def num(self): | ||
return len(self.full_dist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StatsSummary.empty().num
will return 1 now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought is to use an empty list for full_dist
when calling empty()
. That means the other stats will return NaN
which is technically more correct than returning 0
, thought may break something else. Do you have a different preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only other thought is to do e.g. np.mean(self.full_dist) if self.full_dist) else 0.0
which is basically the old behavior, and same for np.std
(np.sum
is safe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think this is fine too.
Can you make it so that user stats can use this too? I think it's pretty small, you'd just need to
|
Made these changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this is actually super powerful for other things besides the reward too 🚢 🇮🇹
Proposed change(s)
Tracks the episodic reward as a histogram in addition to a scalar in tensorboard. This allows the multi-modality of rewards to be more apparent to end-users.
Below is an example of such a graph generated from a PushBlock training run.
Here is the learning curve from FoodCollector:
Not sure that my current solution is the most optimal given how the StatsSummary and StatsWriters are current designed. Open to feedback.
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
Types of change(s)
Checklist
Other comments