Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics of experiments with different tech implementation #34

Closed
elleobrien opened this issue Mar 26, 2020 · 9 comments
Closed

Metrics of experiments with different tech implementation #34

elleobrien opened this issue Mar 26, 2020 · 9 comments
Labels
discussion Waiting for team decision ui/ux User interface/experience

Comments

@elleobrien
Copy link
Contributor

elleobrien commented Mar 26, 2020

This is a discussion point, not really an issue. I'm thinking about how metrics are displayed:

Screen Shot 2020-03-26 at 3 28 14 PM

I definitely want to know that I'm comparing two experiments in which hyperparameters of my model (here, the maximum depth of a random forest classifier max_depth) changed. But, whereas it makes sense to have a "diff" presented for the accuracy metric, I'm not so sure it matters to have a diff present for the hyperparameters. It's not a number we're trying to optimize (unlike accuracy diffs) and visually, it makes the display more cluttered.

I might suggest having a separate table for comparing hyperparameters that doesn't present diffs, just a side-by-side comparison. And then a table for comparing the output metrics, where I do care about the diff. Would this be challenging to implement? Maybe, for each distinct metric file, its own table? And then somewhere in project preferences a user could specify if we want diffs.

Another way of thinking about this is that if I had a spreadsheet of experiments I was trying to compare, I would lay it out this way:

experiment id parameterA parameterB parameterC accuracy
1bac226 24 5 140 0.899
f90k153 24 2 140 0.9111

And then perhaps highlight the row containing the best experiment (assuming that we can specify somehwere if we want + or - for the metric). If you want the diff explicitly calculated, maybe put it in its own field below the table.

@dmpetrov
Copy link
Member

@andronovhopf really great feedback!

Most of it is under development already in the core-dvc:

The spreadsheet of experiments is another great idea. We should think about that.

Re the spreadsheet... what would be your criteria to include an experiment into the table? How many of these would you expect to see here?

@shcheklein
Copy link
Member

diffs with the request to avoid deltas - exactly as you asked :) - iterative/dvc#3528 (in the process)

@andronovhopf how and where do you specify the max_depth parameter? Is train.json is actually a file with hyperparams in your case? Could you share both json files please? :)

@elleobrien
Copy link
Contributor Author

elleobrien commented Mar 27, 2020

@shcheklein yes train.json is a file containing hyperparameters, and that's where max_depth is specified. I just invited you and @dmpetrov to the repository; the metric files are here.

@dmpetrov, re: spreadsheet. Two ways of selecting experiments to display in a table come to mind:

  • If I'm doing a PR, compare PR to master. So only two experiments.
  • A view of all commits on a branch compared. So as many experiments as there are commits (assuming CI was done after each commit)

Any other ideas?

@dmpetrov
Copy link
Member

@andronovhopf did you run it like dvc run -M metrics/train.json -M metrics/eval.json ... and write all the params and metrics separately?

I like both the ways. If we do that:

  • the current one and the baseline are must-have.
  • it is convenient to see all from the current branch up to the master. However, some limits required due to the CI-reports limitation. Something like 10 or 30.

@dmpetrov
Copy link
Member

@andronovhopf did you run it like dvc run -M metrics/train.json -M metrics/eval.json ... and write all the params and metrics separately?

Oh, I see that in the repo https://github.com/andronovhopf/cml_scratch

@elleobrien
Copy link
Contributor Author

@dmpetrov the pipeline has two stages (train.dvc and eval.dvc) and each stage writes a metric file. And yep!

@elleobrien
Copy link
Contributor Author

elleobrien commented Mar 27, 2020

Another observation: my project has two branches; on master I am running a random forest classifier and on DNN a deep neural network. When I look at the report for the last commit on DNN, it looks like this:

Screen Shot 2020-03-26 at 6 29 47 PM

Now, because the hyperparameters I'm collecting are not the same as on master (epochs & neurons vs. max_depth), comparing metrics from train.json doesn't make a lot of sense.

Also, I know we are planning to do this eventually- but here's a case where being able to compare two commits on the same branch, instead of the head of two branches, would be great (as an additional option, not instead of). Since I want to test a few different numbers of neurons/epochs in the neural network.

@DavidGOrtega
Copy link
Contributor

@andronovhopf nice observation. We had that discussion also. That every branch might be different implementations of the same problem to be solved. Like here a DNN vs Random forest.

You can setup a different baseline and a baseline can be an specific commit sha. You can setup your baseline i.e to be HEAD~1 to compare your experiments with your previous one. And thats why the top five list came also in place to have a fast access to the same branch.

In my personal experience, to solve your problem in your DNN branch change the baseline to master/dnn (supposing its called that way) and work with branches of that branch to adjust new parameters.

@DavidGOrtega DavidGOrtega changed the title Thinking about metric display Metrics of epxeriments with different tech implementation May 12, 2020
@DavidGOrtega DavidGOrtega added the enhancement New feature or request label May 12, 2020
@0x2b3bfa0 0x2b3bfa0 changed the title Metrics of epxeriments with different tech implementation Metrics of experiments with different tech implementation Jul 2, 2021
@0x2b3bfa0 0x2b3bfa0 added discussion Waiting for team decision ui/ux User interface/experience and removed enhancement New feature or request labels Jul 2, 2021
@DavidGOrtega
Copy link
Contributor

Closed this is not relevant anymore. Belongs to the CML-DVC incarnation of CML

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Waiting for team decision ui/ux User interface/experience
Projects
None yet
Development

No branches or pull requests

5 participants