Add experimental queue experiments from csv command #1120

mattseddon · 2021-12-03T04:13:40Z

1/5 `master` <- this <- #1122 <- #1124 <- #1123 <- #1125

From getting demos ready and going through the pain of manually queuing experiments I would assume that someone working with the extension would want some quick interface to queue multiple experiments. I considered using a quick pick and other formats to get the data in but the time trusted CSV seemed to be the simplest way to start a conversation.

Demo

Screen.Recording.2021-12-03.at.3.15.51.pm.mov

sroy3 · 2021-12-03T14:02:31Z

This looks like a very cool feature. CSV is not only simple, but a really fast way of entering values and editing. I think I'll use this often in the future.

shcheklein · 2021-12-03T16:17:03Z

Cool stuff, @mattseddon ! This is one of the pain points ML eng, data scientists would have. I think it's fine to merge this, eventually we'll to have some UI/UX way along this to run batch experiments. Regular things are - grid search (right from the table you would specify start, end, step for a param or set of params and it would generate a "grid" of experiments), random search - etc. We'll need to support those right from the table.

CSV/TSV/JSON support sounds great anyway for me. It think that may be even DVC should support this cc @dberenbaum @daavoo what do you think?

daavoo · 2021-12-03T17:54:34Z

CSV/TSV/JSON support sounds great anyway for me. It think that may be even DVC should support this cc @dberenbaum @daavoo what do you think?

For DVC, I have mixed feelings.

On the one hand, it could make sense as a language-agnostic format for defining experiments.

On the other hand, without proper UI to fill the file (unlike VSCode which could build a UI on top), I don't see how this format would be more user-friendly than having an Experiments API in Python or even writing a bash/python/X script (similar to https://dvc.org/blog/hyperparam-tuning).

A little bit off-topic (maybe should be moved to a separated discussion):

There are quite a few methods for hyperparameter tuning which are more efficient (and effective) than Grid/Manual/Random search. If people are not using those it's because they are lacking a happy path (i.e. existing libraries might be hard to set up or it is tedious to adapt the training code to work with).

That's why many are working on building UI/UX for hyperparameter tuning and/or integrations with other tools. These are 2 examples of Bayesian Search: https://docs.wandb.ai/guides/sweeps/quickstart / https://docs.valohai.com/howto/tasks/bayesian/)

I would carefully consider how much time to invest in building tools around Grid/Manual/Random search.

In my opinion (I might be biased from DVCLive perspective), the happy path for hyperparameter tuning in DVC should focus on providing a Python API to queue/launch experiments and work on integrations with known libraries (iterative/dvclive#118).

mattseddon · 2021-12-05T23:19:27Z

extension/src/experiments/queue.ts

+      })
+  })
+
+export const waitForLock = async (cwd: string): Promise<void> => {


[F] This is very much a temporary solution to the complicated problem of queue-ing experiments firing every data update. We could implement something along the lines of #948 (comment) but we would need to

add some kind of mechanism that sends all events to the date update queue

queue the experiment(s)

stop sending data update events to the queue

run the queue

mattseddon · 2021-12-05T23:34:05Z

demo/queue.csv

+lr,weight_decay
+0.0001,0.02
+0.00075,0.01
+0.0005,


[F] I have committed this for an integration test, we can also use it to set up demos.

dberenbaum · 2021-12-06T22:00:42Z

Nice!

CSV/TSV/JSON support sounds great anyway for me.

I have no problem with this as a quick way to test out queuing lots of experiments (and I might find it useful for demos and stuff), but I agree it might not add much for users in DVC. It's pretty easy for users to do this themselves in whatever format they prefer, and I'm not sure how to integrate it into the CLI.

If I am coming up with new ideas on the fly, I'm not sure adding them to CSV saves enough time over adding them directly to the queue (and some of them might not be parameter changes). If I have lots of combinations of parameters to try (like in hyperparameter tuning), I'm more likely to define some criteria programmatically than write manually in a CSV.

There's a related issue in iterative/dvc#5615, which has a lot of 👍. I think adding more options to manage the queue is likely to be more useful than adding experiments from a CSV (other than for internal testing like it's being used here).

eventually we'll to have some UI/UX way along this to run batch experiments. Regular things are - grid search (right from the table you would specify start, end, step for a param or set of params and it would generate a "grid" of experiments), random search - etc. We'll need to support those right from the table.

A little bit off-topic (maybe should be moved to a separated discussion):

Lots to discuss about how to do more complex experiment batches and hyperparameter tuning. I agree the discussion probably is better had elsewhere, but let's definitely have this discussion before we start doing anything on it.

codeclimate · 2021-12-06T22:47:42Z

Code Climate has analyzed commit 7acec17 and detected 1 issue on this pull request.

Here's the issue category breakdown:

Category	Count
Duplication	1

The test coverage on the diff in this pull request is 92.4% (85% is the threshold).

This pull request will bring the total coverage in the repository to 96.7% (0.0% change).

View more on Code Climate.

mattseddon · 2021-12-06T22:52:51Z

@dberenbaum @daavoo agree with all of your points, agree that the entire approach needs a lot of work.

I do think it is important to re-iterate that the extension is trying to hide as much complexity from the user as possible. We also want to lower the barrier to entry for data scientists who aren't as au fait with the terminal. We should keep that in mind when we come back to this.

Thank you both.

dberenbaum · 2021-12-07T01:36:32Z

Well now I can't leave it alone 😄

With that CLI avoidance in mind, what makes sense to me as we get more towards improved UX is to focus on how to set up a single experiment to queue, rather than how to submit a whole batch of experiments. For example, there could be a GUI that shows all the parameters, and users can edit their values and click "queue experiment" without needing to edit yaml files or run terminal commands.

As @daavoo alluded, when it comes to larger scale hyperparameter tuning, users probably can't avoid doing this programmatically since there are so many experiments to queue, the methods of defining the hyperparameter space are too varied, and frequently they can't even all be queued up front because later iterations base their values on the results of previous iterations.

mattseddon added the product PR that affects product label Dec 3, 2021

mattseddon self-assigned this Dec 3, 2021

mattseddon force-pushed the add-queue-experiments-from-csv branch from 832c39c to 01ed540 Compare December 3, 2021 04:28

mattseddon commented Dec 5, 2021

View reviewed changes

mattseddon marked this pull request as ready for review December 5, 2021 23:34

mattseddon requested review from sroy3, rogermparent and shcheklein December 5, 2021 23:36

This was referenced Dec 6, 2021

Match repository to experiments folder structure #1122

Merged

Remove unnecessary event emitter / event chain in Repository #1124

Merged

Add ability to pause data updates #1123

Merged

Pause data updates while we queue experiments #1125

Merged

sroy3 approved these changes Dec 6, 2021

View reviewed changes

rogermparent approved these changes Dec 6, 2021

View reviewed changes

mattseddon added 4 commits December 7, 2021 09:41

add experimental queue experiments from csv command

4294700

add high level test for feature

5defd55

integrate new code with existing

f2ecd9d

report output of queued experiments

7acec17

mattseddon force-pushed the add-queue-experiments-from-csv branch from ca31a8d to 7acec17 Compare December 6, 2021 22:41

mattseddon merged commit 8b13a3e into master Dec 6, 2021

mattseddon deleted the add-queue-experiments-from-csv branch December 6, 2021 22:53

shcheklein mentioned this pull request Dec 8, 2021

Experiments table: action per item to run experiments #1135

Closed

mattseddon mentioned this pull request Feb 1, 2022

Remove queue from csv command #1276

Merged

rogermparent mentioned this pull request Mar 1, 2022

Add mechanism for launching per-row actions from the table webview #1267

Closed

dberenbaum mentioned this pull request Mar 7, 2022

Removing queued experiments #1392

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental queue experiments from csv command #1120

Add experimental queue experiments from csv command #1120

mattseddon commented Dec 3, 2021 •

edited

Loading

sroy3 commented Dec 3, 2021

shcheklein commented Dec 3, 2021

daavoo commented Dec 3, 2021 •

edited

Loading

mattseddon Dec 5, 2021

mattseddon Dec 5, 2021

dberenbaum commented Dec 6, 2021

codeclimate bot commented Dec 6, 2021

mattseddon commented Dec 6, 2021

dberenbaum commented Dec 7, 2021

Add experimental queue experiments from csv command #1120

Add experimental queue experiments from csv command #1120

Conversation

mattseddon commented Dec 3, 2021 • edited Loading

1/5 master <- this <- #1122 <- #1124 <- #1123 <- #1125

Demo

sroy3 commented Dec 3, 2021

shcheklein commented Dec 3, 2021

daavoo commented Dec 3, 2021 • edited Loading

mattseddon Dec 5, 2021

Choose a reason for hiding this comment

mattseddon Dec 5, 2021

Choose a reason for hiding this comment

dberenbaum commented Dec 6, 2021

codeclimate bot commented Dec 6, 2021

mattseddon commented Dec 6, 2021

dberenbaum commented Dec 7, 2021

mattseddon commented Dec 3, 2021 •

edited

Loading

1/5 `master` <- this <- #1122 <- #1124 <- #1123 <- #1125

daavoo commented Dec 3, 2021 •

edited

Loading