Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc exp show checkpoints #5311

Closed
Tracked by #5367
dberenbaum opened this issue Jan 21, 2021 · 8 comments
Closed
Tracked by #5367

dvc exp show checkpoints #5311

dberenbaum opened this issue Jan 21, 2021 · 8 comments
Assignees
Labels
A: experiments Related to dvc exp discussion requires active participation to reach a conclusion ui user interface / interaction

Comments

@dberenbaum
Copy link
Collaborator

Hi @pmrowla , the experiments are looking much more stable than a month or two ago. I wanted to discuss a few issues when using dvc exp show with checkpoints. I'm testing this out on a minimal example that you can find in https://github.com/dberenbaum/dvc-checkpoint (very similar to what's in the wiki).

Here's what I get when running some experiments:

$ dvc exp run
$ dvc exp run --params start=10
$ dvc exp show --no-pager
━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ Experiment    ┃ Created  ┃ epoch ┃ start ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ workspace     │ -        │     3 │ 10    │
│ main          │ 01:44 PM │     - │ 0     │
│ │ ╓ exp-0b13f │ 04:24 PM │     3 │ 10    │
│ │ ╟ 3f80092   │ 04:24 PM │     3 │ 10    │
│ ├─╨ 9ee6195   │ 04:24 PM │     2 │ 10    │
│ │ ╓ exp-f3c4a │ 04:24 PM │     1 │ 0     │
│ ├─╨ 0768b21   │ 04:24 PM │     0 │ 0     │
└───────────────┴──────────┴───────┴───────┘

Thoughts on this:

  1. There are sometimes duplicate entries in the table.
  2. When running an experiment with different parameters, should the output be removed/reset? It looks like the new experiment continues from the previous output, and I'm not sure whether that's intended.

Here's what I get when I try to resume experiments:

$ dvc exp apply exp-f3c4a
ERROR: Experiment 'exp-f3c4a' cannot be applied to because your current workspace contains changes which would be overwritten. Either 'git stash' your current changes before applying this experiment, or re-run this command with '--force' to overwrite your current changes.: Stash apply resulted in merge conflicts: Cmd('git') failed due to: exit code(1)
  cmdline: git stash apply 7dc3a0e4b7846d5ae689d36c9a973b0f5cb1c045
  stderr: '.gitignore already exists, no checkout
dvc.lock already exists, no checkout
scores.yaml already exists, no checkout
error: could not restore untracked files from stash'
$ dvc exp apply -f exp-f3c4a
$ dvc exp res -r exp-f3c4a
$ dvc exp show --no-pager
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━┓                          
┃ Experiment    ┃ Created  ┃ epoch ┃ start ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ workspace     │ -        │     3 │ 0     │
│ main          │ 01:44 PM │     - │ 0     │
│ │ ╓ exp-f3c4a │ 04:32 PM │     3 │ 0     │
│ │ ╟ b31556d   │ 04:32 PM │     2 │ 0     │
│ │ ╟ a30b38c   │ 04:24 PM │     1 │ 0     │
│ ├─╨ 0768b21   │ 04:24 PM │     0 │ 0     │
│ │ ╓ exp-0b13f │ 04:24 PM │     3 │ 10    │
│ │ ╟ 3f80092   │ 04:24 PM │     3 │ 10    │
│ ├─╨ 9ee6195   │ 04:24 PM │     2 │ 10    │
└───────────────┴──────────┴───────┴───────┘

A couple of thoughts on the resume functionality:

  1. Is it expected that -f is needed for apply to work in this case?
  2. The output now seems to have been removed/reset after the apply, which surprised me since that didn't happen when using 'dvc exp run --params`.

Finally, we should make sure there's consistency between how we display this table and how we display the same info in other places, like in viewer. For example, should the branch names show in place of the most recent commit for that branch, or should we have a separate line for the branch name (possibly with the other columns blank) and all commits listed separately under it? @dmpetrov Any thoughts on how to align across projects on questions like this?

@dberenbaum dberenbaum added the A: experiments Related to dvc exp label Jan 21, 2021
@pmrowla
Copy link
Contributor

pmrowla commented Jan 22, 2021

  1. There are sometimes duplicate entries in the table.

These are separate git commits, meaning that there is some difference between the two commits in git-tracked files. It's just that the differences aren't reflected in the table, since we only show DVC metrics/params. We should probably consider adding some option to dvc exp diff to include a unified git diff as well, so users can do something like dvc exp diff --git exp-0b13f 3f80092 and see the git differences.

  1. When running an experiment with different parameters, should the output be removed/reset? It looks like the new experiment continues from the previous output, and I'm not sure whether that's intended.

If you are using dvc exp res[ume] --params ... the previous checkpoint output will not be removed. Internally this is the same as creating a new git branch from within the initial checkpoint experiment branch.

If you are using dvc exp run the output should be removed/reset back to the original state.

$ dvc exp apply exp-f3c4a
ERROR: Experiment 'exp-f3c4a' cannot be applied to because your current workspace contains changes which would be overwritten. Either 'git stash' your current changes before applying this experiment, or re-run this command with '--force' to overwrite your current changes.: Stash apply resulted in merge conflicts: Cmd('git') failed due to: exit code(1)
  1. Is it expected that -f is needed for apply to work in this case?

-f/--force is needed here since you would end up with git conflicts otherwise. We're already considering maybe making --force the default behavior for exp apply though

  1. The output now seems to have been removed/reset after the apply, which surprised me since that didn't happen when using 'dvc exp run --params`.

Same as before, this depends on run/resume usage

Finally, we should make sure there's consistency between how we display this table and how we display the same info in other places, like in viewer.

I think we should try to be consistent when it makes sense to do so, but in general I think we should try to make the CLI table output as minimal/compact as possible. I think a lot of users will end up using exp show --no-pager a lot, and in that case having essentially empty rows/columns takes up a lot of unnecessary space in a terminal (whereas the web viewer or vs code extension don't have this same limitation on UI).

@pmrowla pmrowla added the ui user interface / interaction label Jan 22, 2021
@dberenbaum
Copy link
Collaborator Author

dberenbaum commented Jan 22, 2021

These are separate git commits, meaning that there is some difference between the two commits in git-tracked files. It's just that the differences aren't reflected in the table, since we only show DVC metrics/params. We should probably consider adding some option to dvc exp diff to include a unified git diff as well, so users can do something like dvc exp diff --git exp-0b13f 3f80092 and see the git differences.

It would be great to be able to see a full diff, although dvc exp diff doesn't work at all for me now when using exp- branch identifiers:

$ dvc exp diff --git exp-0b13f 3f80092
ERROR: unexpected error - 'NoneType' object has no attribute 'committed_date'

More importantly, why are there two commits here? The epoch column iterates over each checkpoint, so these commits should represent the same checkpoint. Every time I resume this experiment, I get duplicate commits, and the same happens with new experiments (but not when I resume the initial experiment). The only difference between experiments is that I passed in a different start param.

┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┓                                                                                
┃ Experiment    ┃ Created      ┃ epoch ┃ start ┃                                                                                
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━┩                                                                                
│ workspace     │ -            │     9 │ 100   │                                                                                
│ main          │ Jan 21, 2021 │     - │ 0     │                                                                                
│ │ ╓ exp-45c10 │ 10:17 AM     │     9 │ 100   │                                                                                
│ │ ╟ 4545e50   │ 10:17 AM     │     9 │ 100   │                
│ ├─╨ 7ccd623   │ 10:17 AM     │     8 │ 100   │                                                                                
│ │ ╓ exp-0b13f │ 10:16 AM     │     7 │ 10    │                                                                                
│ │ ╟ a901b62   │ 10:16 AM     │     7 │ 10    │                                                                                
│ │ ╟ b10507b   │ 10:16 AM     │     6 │ 10    │                                                                                
│ │ ╟ bf6fb17   │ Jan 21, 2021 │     5 │ 10    │                                                                                
│ │ ╟ ba46a30   │ Jan 21, 2021 │     5 │ 10    │                                                                                
│ │ ╟ daa8f9b   │ Jan 21, 2021 │     4 │ 10    │                                                                                
│ │ ╟ 611c778   │ Jan 21, 2021 │     3 │ 10    │                                                                                
│ │ ╟ 3f80092   │ Jan 21, 2021 │     3 │ 10    │                                                                                
│ ├─╨ 9ee6195   │ Jan 21, 2021 │     2 │ 10    │                                                                                
│ │ ╓ exp-f3c4a │ 10:15 AM     │     7 │ 0     │                                                                                
│ │ ╟ a662bc7   │ 10:14 AM     │     6 │ 0     │                                                                                
│ │ ╟ c129fa7   │ Jan 21, 2021 │     5 │ 0     │                                                                                
│ │ ╟ 84b57cb   │ Jan 21, 2021 │     4 │ 0     │                                                                                
│ │ ╟ 3125828   │ Jan 21, 2021 │     3 │ 0     │                                                                                
│ │ ╟ b31556d   │ Jan 21, 2021 │     2 │ 0     │                                                                                
│ │ ╟ a30b38c   │ Jan 21, 2021 │     1 │ 0     │                
│ ├─╨ 0768b21   │ Jan 21, 2021 │     0 │ 0     │                                                                                
└───────────────┴──────────────┴───────┴───────┘                                                                               

If you are using dvc exp res[ume] --params ... the previous checkpoint output will not be removed. Internally this is the same as creating a new git branch from within the initial checkpoint experiment branch.

If you are using dvc exp run the output should be removed/reset back to the original state.

I'm using dvc exp run (see the commands in the top block of my original comment) but the output did not get reset.

I think we should try to be consistent when it makes sense to do so, but in general I think we should try to make the CLI table output as minimal/compact as possible. I think a lot of users will end up using exp show --no-pager a lot, and in that case having essentially empty rows/columns takes up a lot of unnecessary space in a terminal (whereas the web viewer or vs code extension don't have this same limitation on UI).

That makes sense, thanks. What about having the branch identifier unindented so it aligns with workspace and the regular git branch names, and then having all the commits indented, including the head commit? That would widen the column but at least wouldn't impact the table length. Example:

━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ Experiment          ┃ Created  ┃ epoch ┃ start ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ workspace           │ -        │     3 │ 10    │
│ main                │ 01:44 PM │     - │ 0     │
│ exp-0b13f ╓ 611c778 │ 04:24 PM │     3 │ 10    │
│ │         ╟ 3f80092 │ 04:24 PM │     3 │ 10    │
│ ├─────────╨ 9ee6195 │ 04:24 PM │     2 │ 10    │
│ exp-f3c4a ╓ a30b38c │ 04:24 PM │     1 │ 0     │
│ ├─────────╨ 0768b21 │ 04:24 PM │     0 │ 0     │
└─────────────────────┴──────────┴───────┴───────┘

@pmrowla
Copy link
Contributor

pmrowla commented Jan 23, 2021

It would be great to be able to see a full diff, although dvc exp diff doesn't work at all for me now when using exp- branch identifiers:

There is no --git option for exp diff yet, but dvc exp diff exp-0b13f 3f80092 should just show the combined output of dvc metrics diff ... and dvc params diff ...

More importantly, why are there two commits here? The epoch column iterates over each checkpoint, so these commits should represent the same checkpoint. Every time I resume this experiment, I get duplicate commits, and the same happens with new experiments (but not when I resume the initial experiment). The only difference between experiments is that I passed in a different start param.

If something else has changed (from git's perspective) in the repo workspace when the experiment run is stopped (whether it's because it was killed via ctrl-c or because stage commands returned cleanly and the pipeline was fully reproduced) then we make an additional git commit (in the same way that for regular/non-checkpoint experiments we make a single commit after reproducing the pipeline). It looks like something in your pipeline is still modifying something in your workspace (probably unrelated to checkpoints) which is being tracked via the extra commit.

If you do git show-ref exp-0b13f it should give you a git sha for that extra commit, and you can do git diff ... to see what the actual difference is between those two commits.

@dberenbaum
Copy link
Collaborator Author

If you do git show-ref exp-0b13f it should give you a git sha for that extra commit, and you can do git diff ... to see what the actual difference is between those two commits.

That's a good tip. I don't see any difference between the commits when I do that. In the most recent version of the table above, there are multiple "duplicates" and there are never differences between the pair of duplicates. I do see differences between all other pairs of commits, so I don't think it's user error, but I'd be happy to be proven wrong. Is it possible that they both point to the same commit?

@pmrowla
Copy link
Contributor

pmrowla commented Jan 25, 2021

Looks like it might be a bug then. If there's no additional changes then we shouldn't be creating the last commit (it should be rejected by git), we've been switching git backends over the past few weeks so it might be related to that.

@pmrowla
Copy link
Contributor

pmrowla commented Feb 1, 2021

The duplicate (empty) commit issue is a bug and will be fixed in #5361

@pmrowla pmrowla added the discussion requires active participation to reach a conclusion label Feb 8, 2021
@pmrowla
Copy link
Contributor

pmrowla commented Feb 8, 2021

As the bug has been resolved I'm marking this as done for this sprint. The issue can stay open while there's still discussion on what the table column/tree/etc format should look like, but I don't think this should be considered a blocker for 2.0 pre-release

@dberenbaum
Copy link
Collaborator Author

Since we have a separate issue for the table formatting, let's close this one. Thanks for following up here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp discussion requires active participation to reach a conclusion ui user interface / interaction
Projects
None yet
Development

No branches or pull requests

2 participants