-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow grouping of multiple runs #376
Comments
Interesting suggestion. You're correct that we don't currently have any machinery to do this. To get you unblocked, it shouldn't be too hard to manually create aggregate runs: read in the data from each of your run files, and emit summaries that contain the mean/median/whatever of your individual runs, saving these to a brand new run. A somewhat related, but certainly distinct, request is in #300. This isn't currently high on our priorities, but we'd be open to reviewing PRs for it! |
I also think this feature can be very useful for tensorboard. Usually you have to run each setting about 10 times to make sure that it is working. It is also the best way to report results of each method in papers. |
A paper that uses visualizations like this: https://arxiv.org/pdf/1709.06560.pdf I'd like to help if no one else is working on it yet. One thing to sort out is how to let the user indicate what part of the run name should be averaged over. For example, if the runs are named like this:
How should the user say that they'd like averaging over the trial field? I think in addition to the run selector, we'd need a field for run aggregation. Then, we could use a distribution view to show the mean and standard error at each step. This wouldn't make sense to show in a relative or wall time view though. |
The upcoming Feedback very much appreciated. Also, if you want to contribute, there are still many features to build. |
At the moment if two |
@chihuahua custom-scalars looks great! Any chance if tensorboard team will be able to work on what's being requested here? Average over several runs will be a popular request at least among RL researchers and will aid fair results being reported. |
Also wondering if there are any updates on this. Would be incredibly beneficial to the research community to have tools that make it easier to show bounds on results. Will increase the accountability of doing this regularly in research |
For quite some time I'm using a custom implementation to solve this problem. https://github.com/Spenhouet/tensorboard-aggregator This tool allows to aggregate multiple tensorboard runs by the max, min, mean, median and standard deviation of all scalars. The resulting aggregations are saved as new tensorboard summaries. |
Is there a way to group multiple runs and display (for example), mean/median of their various success metrics? When conducting an experiment and trying to show that e.g. model A performs better than model B, on average, being able to do 5 instances of model A and 5 instances of model B, and plotting that as just two curves instead of 10, would be a handy tool. Based on some Googling there seems to be other people that would like this as well, and I can't find an option to do anything like that currently.
The text was updated successfully, but these errors were encountered: