Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiments view out of sync #2562

Closed
RR-28023 opened this issue Oct 11, 2022 · 7 comments
Closed

Experiments view out of sync #2562

RR-28023 opened this issue Oct 11, 2022 · 7 comments
Assignees
Labels
A: experiments Area: experiments table webview and everything related priority-p1 Regular product backlog triage

Comments

@RR-28023
Copy link

Hi,

I'm facing an issue when truing to use the DVC extension for experiment tracking. The Experiments view seems to be out of sync with the actual set of of experiments that I've run / queued /deleted. It typically takes a restart of the extension for such view to sync.

For example, the below ilustrates the out of sync issue after having deleted several experiments (by running dvc exp remove [..]):

This is what I see on the Experiments view:

imagen

This what I see when I run dvc exp show:
imagen
and when running dvc exp list:

cont-based-similarity-metric:                                         
        exp-3e0f4
        exp-42c43

The DVC channel output seems normal:

[version: 0.4.13, 2022-10-11T15:05:48.999Z, pid: 36913] > git rev-parse --show-toplevel - COMPLETED (113ms)
[version: 0.4.13, 2022-10-11T15:05:49.000Z, pid: 36915] > git diff --name-only -z - COMPLETED (79ms)
[version: 0.4.13, 2022-10-11T15:05:49.000Z, pid: 36916] > git ls-files --others --exclude-standard - COMPLETED (62ms)
[version: 0.4.13, 2022-10-11T15:05:49.001Z, pid: 36917] > git ls-files --others --exclude-standard --directory --no-empty-directory - COMPLETED (50ms)
[version: 0.4.13, 2022-10-11T15:05:49.001Z, pid: 36918] > git rev-parse --show-toplevel - COMPLETED (37ms)
[version: 0.4.13, 2022-10-11T15:05:50.848Z, pid: 36914] > /home/coder/repos/cont01-kwd-extract/.venv/bin/python -m dvc data status --granular --unchanged --json - COMPLETED (1945ms)
[version: 0.4.13, 2022-10-11T15:05:51.724Z, pid: 36919] > /home/coder/repos/cont01-kwd-extract/.venv/bin/python -m dvc exp show -n 3 --json - COMPLETED (2748ms)
[version: 0.4.13, 2022-10-11T15:05:51.792Z, pid: 36992] > /home/coder/repos/cont01-kwd-extract/.venv/bin/python -m dvc plots diff 23ec77b 263ef67 4148089 653001d 6691968 7b24e31 c3e8a04 workspace -o .dvc/tmp/plots --split --json - INITIALIZED
[version: 0.4.13, 2022-10-11T15:05:53.842Z, pid: 36992] > /home/coder/repos/cont01-kwd-extract/.venv/bin/python -m dvc plots diff 23ec77b 263ef67 4148089 653001d 6691968 7b24e31 c3e8a04 workspace -o .dvc/tmp/plots --split --json - COMPLETED (2071ms)
[version: 0.4.13, 2022-10-11T15:22:20.703Z, pid: 45667] > /home/coder/repos/cont01-kwd-extract/.venv/bin/python -m dvc exp remove exp-46e72 - INITIALIZED
[version: 0.4.13, 2022-10-11T15:22:21.327Z, pid: 45667] > /home/coder/repos/cont01-kwd-extract/.venv/bin/python -m dvc exp remove exp-46e72 - COMPLETED (652ms)
[version: 0.4.13, 2022-10-11T15:22:58.121Z, pid: 45976] > /home/coder/repos/cont01-kwd-extract/.venv/bin/python -m dvc exp remove exp-259ed - INITIALIZED
[version: 0.4.13, 2022-10-11T15:22:58.777Z, pid: 45976] > /home/coder/repos/cont01-kwd-extract/.venv/bin/python -m dvc exp remove exp-259ed - COMPLETED (674ms)

I also face the same issue when I run / queue new experiments, they don't pop up in teh Experiments view unless I restart the extension.

Is there a way to "refresh" the view? I have tried the obvious which is closing it and opening it again (via the DVC: Show Experiments command).

Thanks!

@shcheklein shcheklein added A: experiments Area: experiments table webview and everything related triage priority-p1 Regular product backlog labels Oct 11, 2022
@mattseddon mattseddon self-assigned this Oct 12, 2022
@mattseddon
Copy link
Member

Hi @RR-28023, thanks for the issue.

A few questions from me.

Can you share the version that you are using for each of the following:

  • extension
  • DVC
  • VS Code

Can you share the project that you're working with? Does this happen for all projects or just this particular one?

What OS are you on?

What does the CLI show when you run dvc exp show -n 3? Can you provide a dump of the exp show output in JSON format? Perhaps our parsing logic is failing.

As a (very poor) workaround you should be able to update the experiments table by saving any of your params or dvc.yaml files.

Thanks!

@mattseddon mattseddon removed the triage label Oct 12, 2022
@RR-28023
Copy link
Author

Hi! Sure:

  • Extension: v0.4.13
  • DVC: 2.29.0
  • VS Code: 1.72.1

The project is a private repo we are working on for a client, so I'm afraid I cannot share it. What I could do if it helps is to hop on a call and share my screen if you think that is worth it.

I have not tried running experiments in any other project. What I can say is that I face this issue both when developing locally in my laptop (OS Ubuntu 20.04.5 LTS) and when running it remotely in a Kubernetes cluster (Ubuntu 18.04.6 LTS).

The output of dvc exp show -n 3 seems fine. I think the issue is more related with the extension not noticing that there have been changes and hence not running the commands needed to refresh the Experiments view. Looking at the DVC output channel, it seems that a table refresh is preceeded by something like:

[version: 0.4.13, 2022-10-12T11:32:50.080Z, pid: 27128] > /home/coder/repos/cont01-kwd-extract/.venv/bin/python -m dvc exp show -n 3 --json - COMPLETED (5488ms)

And I think the issue is that this command is not run again whenever there is a change in the experiments.

Saving the params.yaml or dvc.yamls does not seem to trigger such update.

Something worth mentioning is that my dvc.yaml and params.yaml are not in the project's root directory (for good reasons). I actually have to specify the full relative path of the params in my dvc.yaml:

stages:
  embeddings:
    cmd: python3 processes/next_best_asset_recommendation/run_experiments.py
    params:
      - processes/next_best_asset_recommendation/params.yaml:
        - embeddings
        - quality_metrics
    metrics:
    - processes/next_best_asset_recommendation/emb_quality_metrics.json:
        cache: false

Could this be causing trouble?

Thanks!

@mattseddon
Copy link
Member

I think the issue is more related with the extension not noticing that there have been changes and hence not running the commands needed to refresh the Experiments view.

It does look like an issue with file system watchers.

I have checked that nested dvc.yaml files will trigger the watchers and they appear to work as expected on my machine (Mac). From reading this it seems that the implementation of watchers differs wrt the three main OS types. I will need to test on Linux.

Is the project versioned with Git? Where is the Git root with respect to the DVC project? Does your project use symlinks anywhere?

@mattseddon
Copy link
Member

The project is a private repo we are working on for a client, so I'm afraid I cannot share it.

Can you share with me the project structure from exp show? I.e delete all of the data from underneath each key. Something like this:

{
  "workspace": {
    "baseline": {
      "data": {
        "timestamp": null,
        "params": {
          "params.yaml": {
            "data": {
            }
          }
        },
        "deps": {
          "data/MNIST": {
          },
          "train.py": {
          }
        },
        "outs": {
          "model.pt": {
          },
          "data/MNIST/raw": {
          }
        },
        "status": "Success",
        "executor": null,
        "metrics": {
          "training_metrics.json": {
            "data": {
            }
          }
        }
      }
    }
  },
...

Having that would mean I should be able to get to the bottom of this a lot quicker.

@RR-28023
Copy link
Author

It does look like an issue with file system watchers

Now that you say that, could it be related with my filewatcher settings?:

"files.watcherExclude": {
       "**/.git/objects/**": true,
       "**/.git/subtree-cache/**": true,
       "**/node_modules/*/**": true,
       "**/.hg/store/**": true,
       ".venv/**": true,
       "*/.github/*": true
}

@RR-28023
Copy link
Author

Some news @mattseddon. While reloading the extension it got upgraded to v0.5.1 and the issue seems to have disappeared! Strange I know since v0.5.1, based on the release notes, does not sound like it addresses that..

@mattseddon
Copy link
Member

Thanks for letting me know @RR-28023. Please re-open if the issue comes back. We would also love to get your feedback once you've used the extension for a while. @shcheklein will probably reach out on Discord 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Area: experiments table webview and everything related priority-p1 Regular product backlog triage
Projects
None yet
Development

No branches or pull requests

3 participants