Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plots diff: collect live plots for queued experiments #9432

Merged
merged 5 commits into from
May 23, 2023

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented May 10, 2023

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Will close #9369

  • Adds support for queued exp names in dvc plots diff (exp name can be a queued, actively running, or failed task)
  • Actively running exps will return the live values from the temp workspace

@pmrowla pmrowla added the A: plots Related to the plots label May 10, 2023
@pmrowla pmrowla self-assigned this May 10, 2023
@pmrowla
Copy link
Contributor Author

pmrowla commented May 10, 2023

@mattseddon can you test with this PR? With this vscode should be able to run dvc plots diff from the main dvc repo and still get the live values from a temp workspace if the named experiment is running.

This also makes it possible to include queued or failed exp names in dvc plots diff, but I'm not sure if that is actually useful to vscode (since in most cases they will not have their own new metrics values, it will just match whatever the original baseline values were when the experiment was queued/stashed)

@mattseddon
Copy link
Member

This also makes it possible to include queued or failed exp names in dvc plots diff, but I'm not sure if that is actually useful to vscode (since in most cases they will not have their own new metrics values, it will just match whatever the original baseline values were when the experiment was queued/stashed)

I'll continue to exclude these from selection.

[Q] Does the change make it possible to call plots diff with an experiment running in the workspace?

Will test today.

@pmrowla
Copy link
Contributor Author

pmrowla commented May 11, 2023

[Q] Does the change make it possible to call plots diff with an experiment running in the workspace?

No, this only affects queued (or --temp) experiment runs. But if you know an experiment is running in the workspace you could just do plots diff workspace ...

@mattseddon
Copy link
Member

[Q] Does the change make it possible to call plots diff with an experiment running in the workspace?

No, this only affects queued (or --temp) experiment runs. But if you know an experiment is running in the workspace you could just do plots diff workspace ...

That's how it works at the moment but the workaround means we have to hack a few things together for new experiments running in the workspace.

@pmrowla
Copy link
Contributor Author

pmrowla commented May 11, 2023

I think I should be able to get exp-name working for workspace runs, I'll look into it once the queue/--temp stuff is taken care of

@mattseddon
Copy link
Member

mattseddon commented May 11, 2023

When running plots diff with live queued experiments I'm seeing an unexpected error:

[version: 0.8.11, 2023-05-11T01:37:13.017Z, pid: 64018] > /Users/mattseddon/projects/vscode-dvc/demo/.env/bin/python -m dvc plots diff straw-furl -o .dvc/tmp/plots --split --json - FAILED with code 255 (597ms)
ERROR: unexpected error - generator didn't yield



Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
Screen.Recording.2023-05-11.at.11.45.52.am.mov

I get the same result via the CLI:

image

@pmrowla
Copy link
Contributor Author

pmrowla commented May 12, 2023

When running plots diff with live queued experiments I'm seeing an unexpected error:

This should be resolved in the latest PR commit

I'll continue to exclude these from selection.

Since it's unneeded in vscode, support for queued and failed exp names has been removed for now (since it will affect performance), if/when it's needed in the future it is straightforward to add back in

Does the change make it possible to call plots diff with an experiment running in the workspace?

This is now supported in the latest PR commit

Note that in all cases, this lookup by running exp name will only work when the given experiment has actually started running (which may not be immediately after doing dvc exp run --run-all/dvc queue start for queued exps). The expectation here is that vscode knows an experiment is running from either exp show executor information or from a directory/file watcher

@mattseddon
Copy link
Member

I am seeing multiple records in the exp show data for the same experiment during runs. E.g cushy-tint shown below:

image

In the above screenshot I started the queue with 2 workers but somehow ended up with 3 running experiments.

It appears that right now running plots diff against an experiment running in the queue will always return the data from the experiment it was queued from until the running experiment is finished at which point its data will update.

I also managed to completely break the queue. I am unable to run dvc queue status the error states that a ref is not a valid commit, so I remove it. I then get the same error with a different commit. Have been round this loop more than a few times:

~/projects/vscode-dvc/demo test-live-plots !1 ?1 ❯ dvc queue status                                                                                                    .env  base 14:18:00
ERROR: unexpected error - Invalid commit '3e9d2281ce14f59c7efb76944edbb2e8e52f969d'

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
~/projects/vscode-dvc/demo test-live-plots !1 ?1 ❯ dvc exp remove 3e9d2281ce14f59c7efb76944edbb2e8e52f969d                                                      ✘ 255  .env  base 14:18:03
Removed experiments: '3e9d2281ce14f59c7efb76944edbb2e8e52f969d'       
~/projects/vscode-dvc/demo test-live-plots !1 ?1 ❯ dvc queue status                                                                                                    .env  base 14:18:38
ERROR: unexpected error - Invalid commit 'ff02860c641e0b163e2fc2220bbcba7865d4c8c1'

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
~/projects/vscode-dvc/demo test-live-plots !1 ?1 ❯ dvc exp remove ff02860c641e0b163e2fc2220bbcba7865d4c8c1                                                      ✘ 255  .env  base 14:18:42
Removed experiments: 'ff02860c641e0b163e2fc2220bbcba7865d4c8c1'       
~/projects/vscode-dvc/demo test-live-plots !1 ?1 ❯ dvc queue status                                                                                                    .env  base 14:18:50
ERROR: unexpected error - Invalid commit 'dc8d7eff6d43a7f279f601776fa3439e68d73a2e'

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

Verbose error is of the form:

~/projects/vscode-dvc/demo test-live-plots !1 ?1 ❯ dvc queue status --verbose                                                                                          .env  base 14:14:12
2023-05-15 14:14:14,429 DEBUG: v2.56.1.dev33+gce783654, CPython 3.10.6 on macOS-13.3.1-arm64-arm-64bit
2023-05-15 14:14:14,429 DEBUG: command: /Users/mattseddon/projects/vscode-dvc/demo/.env/bin/dvc queue status --verbose
2023-05-15 14:14:14,740 ERROR: unexpected error - Invalid commit 'c17285df4b665fce55faf4d673a6476ac93d1c35'
Traceback (most recent call last):
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/pygit2/repository.py", line 322, in resolve_refish
    reference = self.lookup_reference_dwim(refish)
KeyError: 'c17285df4b665fce55faf4d673a6476ac93d1c35'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 312, in resolve_commit
    commit, _ref = self._resolve_refish(rev)
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 119, in _resolve_refish
    commit, ref = self.repo.resolve_refish(refish)
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/pygit2/repository.py", line 325, in resolve_refish
    commit = self.revparse_single(refish)
KeyError: 'c17285df4b665fce55faf4d673a6476ac93d1c35'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/cli/__init__.py", line 210, in main
    ret = cmd.do_run()
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/cli/command.py", line 26, in do_run
    return self.run()
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/commands/queue/status.py", line 18, in run
    result = self.repo.experiments.celery_queue.status()
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/repo/experiments/queue/base.py", line 229, in status
    result.extend(
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/repo/experiments/queue/base.py", line 230, in <genexpr>
    _format_entry(queue_entry, exp_result=exp_result, status="Success")
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/repo/experiments/queue/base.py", line 213, in _format_entry
    "timestamp": _get_timestamp(entry.stash_rev),
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/repo/experiments/queue/base.py", line 197, in _get_timestamp
    commit = self.scm.resolve_commit(rev)
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/scmrepo/git/__init__.py", line 286, in _backend_func
    result = func(*args, **kwargs)
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 314, in resolve_commit
    raise SCMError(f"Invalid commit '{rev}'")
scmrepo.exceptions.SCMError: Invalid commit 'c17285df4b665fce55faf4d673a6476ac93d1c35'

2023-05-15 14:14:14,803 DEBUG: Removing '/Users/mattseddon/projects/vscode-dvc/.6qcty2eRhkMZd62NBav3gR.tmp'
2023-05-15 14:14:14,803 DEBUG: Removing '/Users/mattseddon/projects/vscode-dvc/.6qcty2eRhkMZd62NBav3gR.tmp'
2023-05-15 14:14:14,803 DEBUG: Removing '/Users/mattseddon/projects/vscode-dvc/.6qcty2eRhkMZd62NBav3gR.tmp'
2023-05-15 14:14:14,803 DEBUG: Removing '/Users/mattseddon/projects/vscode-dvc/demo/.dvc/cache/.8EgPpkLzo8ejmCGXQXtNqV.tmp'
2023-05-15 14:14:14,805 DEBUG: Version info for developers:
DVC version: 2.56.1.dev33+gce783654
-----------------------------------
Platform: Python 3.10.6 on macOS-13.3.1-arm64-arm-64bit
Subprojects:
        dvc_data = 0.47.1
        dvc_objects = 0.21.1
        dvc_render = 0.3.1
        dvc_task = 0.2.1
        scmrepo = 1.0.1
Supports:
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.3.0, boto3 = 1.24.59)
Config:
        Global: /Users/mattseddon/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/9b2d1355ef26cb2a38a704c54d1f3710

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-05-15 14:14:14,806 DEBUG: Analytics is enabled.
2023-05-15 14:14:14,855 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmp9720fq94']'
2023-05-15 14:14:14,857 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmp9720fq94']'

Trying to garbage collect all the experiments gives:

~/projects/vscode-dvc/demo test-live-plots !1 ?1 ❯ dvc exp gc -T --verbose                                                                                      ✘ 255  .env  base 14:22:02
2023-05-15 14:22:30,903 DEBUG: v2.56.1.dev33+gce783654, CPython 3.10.6 on macOS-13.3.1-arm64-arm-64bit
2023-05-15 14:22:30,903 DEBUG: command: /Users/mattseddon/projects/vscode-dvc/demo/.env/bin/dvc exp gc -T --verbose
2023-05-15 14:22:31,082 WARNING: This will remove all experiments except those derived from the workspace and all git tags of the current repo. Run queued experiments will be removed.
Are you sure you want to proceed? [y/n]: y
2023-05-15 14:22:32,650 ERROR: '51077f7924f617344a02834e8fb568de4cac81fb' is not a valid experiment name
Traceback (most recent call last):
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/cli/__init__.py", line 210, in main
    ret = cmd.do_run()
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/cli/command.py", line 26, in do_run
    return self.run()
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/commands/experiments/gc.py", line 58, in run
    removed = self.repo.experiments.gc(
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/repo/experiments/__init__.py", line 485, in gc
    return gc(self.repo, *args, **kwargs)
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/repo/__init__.py", line 65, in wrapper
    return f(repo, *args, **kwargs)
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/repo/experiments/gc.py", line 50, in gc
    repo.experiments.celery_queue.remove(delete_stashes)
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/repo/experiments/queue/celery.py", line 490, in remove
    return celery_remove(self, *args, **kwargs)
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.10/site-packages/dvc/repo/experiments/queue/remove.py", line 145, in celery_remove
    raise UnresolvedExpNamesError(remained)
dvc.repo.experiments.exceptions.UnresolvedExpNamesError: '51077f7924f617344a02834e8fb568de4cac81fb' is not a valid experiment name

2023-05-15 14:22:32,653 DEBUG: Analytics is enabled.
2023-05-15 14:22:32,710 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmpp22g1_pu']'
2023-05-15 14:22:32,711 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmpp22g1_pu']'

@pmrowla
Copy link
Contributor Author

pmrowla commented May 16, 2023

I'm not able to reproduce whatever caused your repo/queue state, but you probably need to clean everything with

rm -r .dvc/tmp/exps

@pmrowla
Copy link
Contributor Author

pmrowla commented May 16, 2023

It appears that right now running plots diff against an experiment running in the queue will always return the data from the experiment it was queued from until the running experiment is finished at which point its data will update.

This should be fixed in the latest PR commit, there was a bug where we were collecting config for the running exp but data was still only collected from the main repo's workspace

@codecov
Copy link

codecov bot commented May 16, 2023

Codecov Report

Patch coverage: 47.22% and project coverage change: -0.04 ⚠️

Comparison is base (182728d) 91.40% compared to head (0050416) 91.36%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9432      +/-   ##
==========================================
- Coverage   91.40%   91.36%   -0.04%     
==========================================
  Files         475      476       +1     
  Lines       37240    37270      +30     
  Branches     5354     5358       +4     
==========================================
+ Hits        34039    34052      +13     
- Misses       2638     2653      +15     
- Partials      563      565       +2     
Impacted Files Coverage Ξ”
dvc/repo/experiments/__init__.py 82.54% <ΓΈ> (+1.11%) ⬆️
dvc/repo/experiments/queue/celery.py 79.46% <ΓΈ> (+0.87%) ⬆️
dvc/repo/experiments/queue/tempdir.py 68.88% <ΓΈ> (+2.57%) ⬆️
dvc/repo/experiments/queue/base.py 81.73% <5.88%> (-4.16%) ⬇️
dvc/repo/experiments/queue/workspace.py 83.70% <25.00%> (+4.68%) ⬆️
dvc/repo/experiments/brancher.py 39.28% <39.28%> (ΓΈ)
dvc/repo/experiments/executor/base.py 85.19% <50.00%> (-0.18%) ⬇️
dvc/repo/experiments/exceptions.py 74.54% <66.66%> (-0.46%) ⬇️
dvc/repo/plots/__init__.py 91.92% <100.00%> (+0.14%) ⬆️

... and 1 file with indirect coverage changes

β˜” View full report in Codecov by Sentry.
πŸ“’ Do you have feedback about the report comment? Let us know in this issue.

@mattseddon
Copy link
Member

It seems like the new behaviour is working but we've just released a big feature and I am seeing some very strange behaviour when combining our current code with this.

I don't think it is related to this change. I will pick it apart tomorrow. <- Rolled back and the issue is definitely on our side.

Demo (working live plots)

Screen.Recording.2023-05-17.at.7.19.38.pm.mov

@pmrowla pmrowla marked this pull request as ready for review May 23, 2023 05:30
@pmrowla pmrowla merged commit 89109fa into iterative:main May 23, 2023
@pmrowla pmrowla deleted the plots-exp-live branch May 23, 2023 06:43
@pmrowla pmrowla added the backport backport this PR to 2.x branch label May 23, 2023
@pmrowla pmrowla added the A: experiments Related to dvc exp label May 23, 2023
@pmrowla pmrowla mentioned this pull request May 23, 2023
2 tasks
@pmrowla
Copy link
Contributor Author

pmrowla commented May 23, 2023

@mattseddon this is released in 2.58.1 on the DVC side

@mattseddon
Copy link
Member

thanks @pmrowla I'll make the necessary updates πŸ™πŸ»

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp A: plots Related to the plots backport backport this PR to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collect live plots for experiments running in the queue
2 participants