Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension overreacts and runs extra DVC commands #948

Closed
shcheklein opened this issue Oct 26, 2021 · 16 comments
Closed

Extension overreacts and runs extra DVC commands #948

shcheklein opened this issue Oct 26, 2021 · 16 comments
Labels
A: experiments Area: experiments table webview and everything related A: integration Area: DVC integration layer bug Something isn't working priority-p1 Regular product backlog

Comments

@shcheklein
Copy link
Member

Description

Looks like even if you run dvc version it triggers some DVC commands.
If you queue experiments you would also have to wait to see it in the table, since extension runs a lot of stuff.

Expected

Experiments is queued and we have the table updates almost immediately

Environment

DVC Version:

DVC version: UNKNOWN 
---------------------------------
Platform: Python 3.8.9 on macOS-10.15.6-x86_64-i386-64bit
Supports:
        azure (adlfs = 2021.9.1, knack = 0.8.2, azure-identity = 1.6.0),
        gdrive (pydrive2 = 1.9.3),
        gs (gcsfs = 2021.8.1),
        hdfs (fsspec = 2021.8.1, pyarrow = 4.0.1),
        webhdfs (fsspec = 2021.8.1),
        http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
        https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
        s3 (s3fs = 2021.8.1, boto3 = 1.17.106),
        ssh (sshfs = 2021.8.1),
        oss (ossfs = 2021.8.0),
        webdav (webdav4 = 0.9.1),
        webdavs (webdav4 = 0.9.1)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk1s1
Repo: dvc, git

(cc @efiop @skshetry - how is it possible that version is UNKNOWN?)

Extension:

v0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6

Code:

Version: 1.61.2
Commit: 6cba118ac49a1b88332f312a8f67186f7f3c1643
Date: 2021-10-19T15:49:28.381Z (1 wk ago)
Electron: 13.5.1
Chrome: 91.0.4472.164
Node.js: 14.16.0
V8: 9.1.269.39-electron.0
OS: Darwin x64 19.6.0

Logs

Logs ``` [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:34.758Z, pid: 87295] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc exp run --queue - COMPLETED (642ms) [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:37.579Z, pid: 87300] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc exp show --show-json - FAILED with code 1 (2891ms) ERROR: failed to show experiments - Unable to acquire lock. Most likely another DVC process is running or was terminated abruptly. Check the page for other possible reasons and to learn how to resolve this. [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:38.087Z, pid: 87310] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc exp show --show-json - INITIALIZED [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:40.809Z, pid: 87299] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc status --show-json - COMPLETED (6124ms) [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:40.810Z, pid: 87311] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc diff --show-json - INITIALIZED [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:42.190Z, pid: 87310] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc exp show --show-json - COMPLETED (4105ms) [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:42.192Z, pid: 87338] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc exp show --show-json - INITIALIZED [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:45.066Z, pid: 87338] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc exp show --show-json - FAILED with code 1 (2875ms) ERROR: failed to show experiments - Unable to acquire lock. Most likely another DVC process is running or was terminated abruptly. Check the page for other possible reasons and to learn how to resolve this. [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:45.572Z, pid: 87359] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc exp show --show-json - INITIALIZED [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:48.442Z, pid: 87359] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc exp show --show-json - FAILED with code 1 (2872ms) ERROR: failed to show experiments - Unable to acquire lock. Most likely another DVC process is running or was terminated abruptly. Check the page for other possible reasons and to learn how to resolve this. [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:49.449Z, pid: 87400] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc exp show --show-json - INITIALIZED [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:51.570Z, pid: 87311] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc diff --show-json - COMPLETED (10761ms) [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:51.583Z, pid: 87429] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc status --show-json - INITIALIZED [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:53.035Z, pid: 87400] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc exp show --show-json - COMPLETED (3587ms) [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:58.241Z, pid: 87429] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc status --show-json - COMPLETED (6659ms) [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:18:58.244Z, pid: 87470] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc diff --show-json - INITIALIZED [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:19:07.655Z, pid: 87470] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc diff --show-json - COMPLETED (9413ms) [version: 0.1.0-2f67bb018b6eac3f9889e8d2f5420322366cdcd6, 2021-10-26T22:19:07.667Z, pid: 87563] > /Users/ivan/Projects/get-started-experiments/.env/bin/python -m dvc list . --dvc-only -R --show-json - INITIALIZED ```
@shcheklein shcheklein added bug Something isn't working priority-p1 Regular product backlog A: experiments Area: experiments table webview and everything related labels Oct 26, 2021
@mattseddon
Copy link
Contributor

If you queue an experiment it touches the .git/refs, we have to watch those refs to update our state after we make a git commit.

@efiop
Copy link

efiop commented Oct 28, 2021

@shcheklein How did you install it?

@shcheklein
Copy link
Member Author

@efiop it's pip install -e "../dvc[all] from the local system (but I haven't updated it for about a month).

@efiop
Copy link

efiop commented Oct 29, 2021

Can't reproduce on the latest master. If you can - please feel free to create an issue in dvc repo.

NOTE for others: this is about UNKNOWN issue, not the main topic of this issue.

@skshetry
Copy link
Member

@shcheklein, have you run git pull && pip install -e ... recently? If you haven't, most likely, the code you have is recent, but you probably have not done install recently.

@mattseddon
Copy link
Contributor

mattseddon commented Oct 29, 2021

From looking at the logs this actually looks like the expected behaviour with a queue of size 1.

exp show runs 4 times but only succeeds twice.
diff runs twice.
status runs twice.
list -R runs once.

The behaviour looks overreactive because the commands take so long to run.

Edit: That's not to say that we can't improve our watchers. I will also look into why dvc version is causing the watcher to fire.

@mattseddon
Copy link
Contributor

dvc version touches a temp file in example-dvc-experiments.

examples: example-dvc-experiments/.3178aeaf-af5e-4526-9ba1-71c6f5f8d1b0, example-dvc-experiments/.c792876c-7e84-4d02-b941-8ba3910fac48

@shcheklein
Copy link
Member Author

from looking at the logs this actually looks like the expected behaviour with a queue of size 1.

Question is why do we run diff, status, etc at all when I queue an experiment.

@mattseddon
Copy link
Contributor

from looking at the logs this actually looks like the expected behaviour with a queue of size 1.

Question is why do we run diff, status, etc at all when I queue an experiment.

In our demo project when we queue an experiment the following files are "touched":

vscode-dvc/.git/refs/remotes/origin/HEAD
vscode-dvc/demo/data/MNIST/raw/t10k-images-idx3-ubyte
vscode-dvc/demo/data/MNIST/raw/train-labels-idx1-ubyte
vscode-dvc/demo/data/MNIST/raw/train-labels-idx1-ubyte.gz
vscode-dvc/demo/data/MNIST/raw/t10k-labels-idx1-ubyte
vscode-dvc/demo/data/MNIST/raw/train-images-idx3-ubyte.gz
vscode-dvc/demo/data/MNIST/raw/t10k-images-idx3-ubyte.gz
vscode-dvc/demo/data/MNIST/raw/t10k-labels-idx1-ubyte.gz
vscode-dvc/demo/data/MNIST/raw/train-images-idx3-ubyte
vscode-dvc/demo/data/MNIST/raw.dvc
vscode-dvc/demo/data/MNIST/raw/t10k-images-idx3-ubyte
vscode-dvc/demo/data/MNIST/raw/train-labels-idx1-ubyte
vscode-dvc/demo/data/MNIST/raw/train-labels-idx1-ubyte.gz
vscode-dvc/demo/data/MNIST/raw/t10k-labels-idx1-ubyte
vscode-dvc/demo/data/MNIST/raw/train-images-idx3-ubyte.gz
vscode-dvc/demo/data/MNIST/raw/t10k-images-idx3-ubyte.gz
vscode-dvc/demo/data/MNIST/raw/t10k-labels-idx1-ubyte.gz
vscode-dvc/demo/data/MNIST/raw/train-images-idx3-ubyte
vscode-dvc/demo/data/MNIST/raw

I assume that is because in the mechanics of queuing an experiment DVC take a copy of the current workspace and saves it into a temp location. There is no simple way to stop the repository watcher from firing under these circumstances.

@shcheklein
Copy link
Member Author

Thanks, @mattseddon . Let's discuss it with the DVC team/raise the ticket there. From the top of my head I don't see a strong reason to do this on the DVC end.

@efiop
Copy link

efiop commented Nov 15, 2021

@pmrowla Maybe you know what's going on in #948 (comment) from the top of your head.

@pmrowla
Copy link

pmrowla commented Nov 15, 2021

as @mattseddon noted, queueing an experiment will touch files in the workspace (because we git stash and unstash changes in order to save the queued workspace state).

If the vscode extension has file watchers on the workspace that will trigger dvc diff/status/etc commands when the workspace is touched, it sounds like they just need to be adjusted to not fire during dvc exp run --queue (since the workspace modification is expected, and the workspace state before and after exp run --queue will be unchanged)

@shcheklein
Copy link
Member Author

it sounds like they just need to be adjusted to not fire during dvc exp run --queue

it's probably not possible to this reliably - we might lose some events that were not related to the dvc exp run. May be this is the best we could do though.

@shcheklein
Copy link
Member Author

@pmrowla do you know by chance why does git touch files that are not modified?

do we have any other ideas how to mitigate this behavior?

@mattseddon
Copy link
Contributor

Discussed this in a recent meeting with the DVC team. We excluded a subdirectory from file watching in #1533 and it has helped.

@mattseddon mattseddon added the A: integration Area: DVC integration layer label Apr 20, 2022
@shcheklein
Copy link
Member Author

@mattseddon closing this, since it looks like it is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Area: experiments table webview and everything related A: integration Area: DVC integration layer bug Something isn't working priority-p1 Regular product backlog
Projects
None yet
Development

No branches or pull requests

5 participants