Allow grouping of multiple runs #376

Timeroot · 2017-08-17T18:38:31Z

Is there a way to group multiple runs and display (for example), mean/median of their various success metrics? When conducting an experiment and trying to show that e.g. model A performs better than model B, on average, being able to do 5 instances of model A and 5 instances of model B, and plotting that as just two curves instead of 10, would be a handy tool. Based on some Googling there seems to be other people that would like this as well, and I can't find an option to do anything like that currently.

wchargin · 2017-08-17T19:14:16Z

Interesting suggestion. You're correct that we don't currently have any machinery to do this.

To get you unblocked, it shouldn't be too hard to manually create aggregate runs: read in the data from each of your run files, and emit summaries that contain the mean/median/whatever of your individual runs, saving these to a brand new run.

A somewhat related, but certainly distinct, request is in #300.

This isn't currently high on our priorities, but we'd be open to reviewing PRs for it!

donamin · 2017-09-20T11:56:46Z

I also think this feature can be very useful for tensorboard. Usually you have to run each setting about 10 times to make sure that it is working. It is also the best way to report results of each method in papers.

sanchom · 2017-10-13T01:01:15Z

A paper that uses visualizations like this: https://arxiv.org/pdf/1709.06560.pdf

I'd like to help if no one else is working on it yet.

One thing to sort out is how to let the user indicate what part of the run name should be averaged over. For example, if the runs are named like this:

lstm_dropout-0.5_trial-0001
lstm_dropout-0.5_trial-0002
lstm_dropout-0.5_trial-0003
lstm_dropout-0.25_trial-0001
lstm_dropout-0.25_trial-0002
lstm_dropout-0.25_trial-0003

How should the user say that they'd like averaging over the trial field?

I think in addition to the run selector, we'd need a field for run aggregation. Then, we could use a distribution view to show the mean and standard error at each step. This wouldn't make sense to show in a relative or wall time view though.

chihuahua · 2017-10-20T19:00:43Z

The upcoming custom-scalars plugin lets you create plots with custom run-tag combos ... as well as organize charts with any layout you want.
#664

Feedback very much appreciated. Also, if you want to contribute, there are still many features to build.

Spenhouet · 2018-04-18T19:28:02Z

At the moment if two events.out.tfevents.... files are in one folder, only the first shows up in TensorBoard.
I would wish for all such files in one folder to result in an average for all scalars.

srinivr · 2018-07-26T23:52:09Z

@chihuahua custom-scalars looks great! Any chance if tensorboard team will be able to work on what's being requested here? Average over several runs will be a popular request at least among RL researchers and will aid fair results being reported.

balloch · 2018-08-08T19:08:59Z

Also wondering if there are any updates on this. Would be incredibly beneficial to the research community to have tools that make it easier to show bounds on results. Will increase the accountability of doing this regularly in research

Spenhouet · 2019-02-05T13:17:19Z

For quite some time I'm using a custom implementation to solve this problem.
Since there is still no build in function for this requirement I decided to clean up my own solution and to release it on GitHub as standalone tool:

https://github.com/Spenhouet/tensorboard-aggregator

This tool allows to aggregate multiple tensorboard runs by the max, min, mean, median and standard deviation of all scalars. The resulting aggregations are saved as new tensorboard summaries.
It is also possible to save the aggregations to .csv files.

wchargin added plugin:scalars stat:contributions welcome labels Aug 17, 2017

nfelt mentioned this issue May 2, 2019

Design a public Python API for the hparams plugin #1998

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow grouping of multiple runs #376

Allow grouping of multiple runs #376

Timeroot commented Aug 17, 2017

wchargin commented Aug 17, 2017

donamin commented Sep 20, 2017

sanchom commented Oct 13, 2017 •

edited

Loading

chihuahua commented Oct 20, 2017 •

edited

Loading

Spenhouet commented Apr 18, 2018 •

edited

Loading

srinivr commented Jul 26, 2018

balloch commented Aug 8, 2018

Spenhouet commented Feb 5, 2019

Allow grouping of multiple runs #376

Allow grouping of multiple runs #376

Comments

Timeroot commented Aug 17, 2017

wchargin commented Aug 17, 2017

donamin commented Sep 20, 2017

sanchom commented Oct 13, 2017 • edited Loading

chihuahua commented Oct 20, 2017 • edited Loading

Spenhouet commented Apr 18, 2018 • edited Loading

srinivr commented Jul 26, 2018

balloch commented Aug 8, 2018

Spenhouet commented Feb 5, 2019

sanchom commented Oct 13, 2017 •

edited

Loading

chihuahua commented Oct 20, 2017 •

edited

Loading

Spenhouet commented Apr 18, 2018 •

edited

Loading