exp show: include running experiment #5965

dberenbaum · 2021-05-04T19:43:42Z

Show currently running experiment in the table. cc @shcheklein @pmrowla

dberenbaum · 2021-05-04T20:24:44Z

@shcheklein Please follow up to clarify anything that I missed on this. Also, can you clarify the motivation? Is it to give users with long-running experiments some confirmation that their experiment is running?

I think this introduces some complications that raise new questions:

As far as I understand today, no experiment branch is created until the experiment completes. Should an experiment branch with name be generated as soon as the experiment starts, or should it be left blank until completion? If the experiment fails, should it be removed from the table, and should its branch be deleted?
Since the commit sha for the experiment won't exist until it completes, does it matter if there is no sha or it changes after the experiment completes?
Should the same behaviors apply to queued experiments?

pmrowla · 2021-05-05T02:41:54Z

What should we even display for a running experiment? For workspace runs, we already have the workspace table entry, which reflects whatever is happening in the running experiment.

Do we just want an equivalent table entry for whatever is in temp directory for --temp/--queue runs?

It's not exactly clear what DVC should display, because outputs don't have to exist until a stage is completed. So for a long running stage command, nothing in the repo state may actually appear modified (from DVC's perspective) until the stage has completed.

karajan1001 · 2021-05-05T02:50:15Z

What should we even display for a running experiment? For workspace runs, we already have the workspace table entry, which reflects whatever is happening in the running experiment.

How about running and in the queue?

shcheklein · 2021-05-05T03:02:34Z

We should always see all the running/queued/executed experiments in the table. Can it be done but marking workspace with a special flag - probably, yes. It's a good point. But we still need to do this in the dvc exp show output, right? Is it enough for the --temp, etc - I don't know to be honest, I don't have enough experience. But as I run something I want to have a way to see what's going on now and to be able to stop them for example. I mean, I want in this case to see the fact that it's running at least.

Should the same behaviors apply to queued experiments?

yes, in a sense that I'd like to see that it's running now and being able to stop it. Also, can we show that it was cancelled? E.g. to launch it with a different set of params?

Since the commit sha for the experiment won't exist until it completes, does it matter if there is no sha or it changes after the experiment completes?

my initial take - it doesn't matter that much, sha is probably needed for operation that should not be available until it's done, right?

As far as I understand today, no experiment branch is created until the experiment completes

curious why do we call it branch? :)

Should an experiment branch with name be generated as soon as the experiment starts, or should it be left blank until completion? If the experiment fails, should it be removed from the table, and should its branch be deleted?

good questions. I think it's related to Peter's point to some extent. We already have corresponding entries for pretty much all experiments, right? workspace, or queued entries ... we can utilize those, I guess? change their state somehow ...

On the other hand, how do we show a running checkpoint? Ideally it should be "branched" out of the previous, not from the workspace? Not 100% sure here.

dberenbaum · 2021-05-05T12:59:41Z

But as I run something I want to have a way to see what's going on now and to be able to stop them for example.

Okay, this seems like the use case we are missing. There's no way to cancel an ongoing experiment now except to ctrl-c in the window where the experiment is running, and it's obvious it's running in that window. I can see how in vscode there might need to be something like a stop button in the table.

curious why do we call it branch? :)

I called it a branch since the only difference between an experiment and a Git branch is the path to the ref (.git/refs/exps vs .git/refs/heads). I meant that this named ref for an experiment (.git/refs/exps/.../exp-12345) is not generated until the experiment completes. I suppose one option to consider is to generate that ref immediately with the commit off of which the experiment starts and then update the ref when it completes, but need to give it more thought and hear what others think.

Edit: Also, it seems this is getting towards functionality in #5615

dberenbaum · 2021-05-18T15:41:21Z

@shcheklein If we start with adding any running experiments and checkpoints to the table, whether in the workspace or not, is that enough for now? Do we want to open a separate ticket for stopping an experiment?

My opinions on how it should look:

Add some signal (similar to the asterisk for the queue) that shows which rows are for running experiments.
Having the final exp name at the start would be nice, but it's not necessary if it's easier to give temporary names like the queue today.
It's okay to leave cancelled experiments in the table or to drop them when cancelled. If they are left in the table, they probably need some other signal that they are cancelled/incomplete, and it would be nice to have a way to clean them up similar to dvc exp rm --queue.

shcheklein · 2021-05-18T18:03:11Z

whether in the workspace or not, is that enough for now?

yes! that would be a great start

do we want to open a separate ticket for stopping an experiment?

yes, we can discuss and address it separately

My opinions on how it should look:

For VS Code we care about the API (--show-json) but probably it will be solved if we solve in the table.

dberenbaum · 2021-05-18T20:27:32Z

@pmrowla Any questions or concerns?

pmrowla · 2021-05-20T01:40:22Z

It's okay to leave cancelled experiments in the table or to drop them when cancelled. If they are left in the table, they probably need some other signal that they are cancelled/incomplete, and it would be nice to have a way to clean them up similar to dvc exp rm --queue.

I'm not sure what you mean here? Whatever is cancelled or incomplete in the workspace is just the workspace row. If it's a queued run, cancelled or failed runs currently just go back into the queue so they can be retried. Do we want to change this behavior?

@dberenbaum

dberenbaum · 2021-05-20T12:47:04Z

I'm not sure what you mean here? Whatever is cancelled or incomplete in the workspace is just the workspace row. If it's a queued run, cancelled or failed runs currently just go back into the queue so they can be retried. Do we want to change this behavior?

No, that's fine. I'm not sure how that will work in combination with showing the running experiment, but I can wait and see what it looks like.

dberenbaum added the product: VSCode Integration with VSCode extension label May 4, 2021

shcheklein mentioned this issue May 4, 2021

Review: Experiments workflow iterative/vscode-dvc#374

Closed

5 tasks

dberenbaum added the discussion requires active participation to reach a conclusion label May 5, 2021

efiop assigned dberenbaum May 18, 2021

dberenbaum mentioned this issue May 24, 2021

exp show: show names for the queued experiments #6050

Closed

dberenbaum removed the discussion requires active participation to reach a conclusion label May 25, 2021

pmrowla self-assigned this May 25, 2021

pmrowla mentioned this issue Jun 15, 2021

exp show: display running/queued state for experiments #6174

Merged

2 tasks

dberenbaum added the A: experiments Related to dvc exp label Jun 16, 2021

pmrowla closed this as completed in #6174 Jul 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp show: include running experiment #5965

exp show: include running experiment #5965

dberenbaum commented May 4, 2021

dberenbaum commented May 4, 2021

pmrowla commented May 5, 2021

karajan1001 commented May 5, 2021

shcheklein commented May 5, 2021

dberenbaum commented May 5, 2021 •

edited

Loading

dberenbaum commented May 18, 2021

shcheklein commented May 18, 2021

dberenbaum commented May 18, 2021

pmrowla commented May 20, 2021

dberenbaum commented May 20, 2021

exp show: include running experiment #5965

exp show: include running experiment #5965

Comments

dberenbaum commented May 4, 2021

dberenbaum commented May 4, 2021

pmrowla commented May 5, 2021

karajan1001 commented May 5, 2021

shcheklein commented May 5, 2021

dberenbaum commented May 5, 2021 • edited Loading

dberenbaum commented May 18, 2021

shcheklein commented May 18, 2021

dberenbaum commented May 18, 2021

pmrowla commented May 20, 2021

dberenbaum commented May 20, 2021

dberenbaum commented May 5, 2021 •

edited

Loading