Closed
Description
From #318 the basic user cases are as follows:
Proposal
SCM view
- View all directories and files that have changed when compared to HEAD or cache (Visibility / Situational awareness)
- Checkout / commit and files or directories that have changed (Actions) - exact UX TBD.
- Checkout / commit / push or pull the entire repository (Actions)
Statuses that we currently provide in the extension
Status | SCM View | Decorations Provided ** | Sourced from | Notes |
---|---|---|---|---|
added | Y | Y | diff + list |
|
deleted | Y | Y | diff + list |
|
modified | Y | Y | diff + list + status |
|
notInCache | Y | Y | diff + list |
|
renamed | Y | Y | diff + list |
|
stageModified | Y | Y | diff + list + status |
For a detailed explanation of modified vs stageModified see #318 (comment) |
untracked | Y | Y | git |
this is untracked with respect to both git and dvc. We show these files because the user may want to dvc add them. |
tracked | Y | Y | list |
we decorate tracked because they are generally "git ignored" which will give them a "greyed out" decoration |
** Where possible we match the git extension's decorations because we are trying to make the extension feel as native as possible. Our SCM integration is designed to show the user the state of the workspace with respect to the most recent commit.
Current approach (parallel CLI Commands)
name | command | reason |
---|---|---|
list |
dvc list . --dvc-only -R --show-json |
provides a list of all tracked files that we use for both decoration and SCM purposes. In the SCM view all files that we show must be tracked by DVC. We do this because we end up with untracked but modified (duplicates) items in the tree from diff if we do not |
diff |
dvc diff --show-json |
we map the output of diff directly to the list output to set all added , deleted , renamed , notInCache . We use it in combination with the status output to determine the difference between modified and "stage modified" |
status |
dvc status --show-json |
only used to determine the difference between modified and "stage modified" |
We currently try to run all three of the above commands in parallel. If any of the commands fail then we will retry all three until they have all completed without error. We do this to best mitigate stale information ending up in the extension.
Issues with the current approach
- It's still slow (General performance of trees #608)
- We are unsure of the what the actual UI / UX should be (Decide on and implement UX for checkout / commit workflow in SCM view #609)
- A lot of data is sent between the CLI and extension that is unused (example: in
get-started-experiments
after first running an experiment the output ofdiff
contains ~80k "added" files, none of these files are tracked by dvc so we filter all of the records out) - We have issues running multiple commands in parallel (After reloading the window experiments part is stuck in the "loading" state #767 (comment)) <- this is particularly important because it means we cannot currently run the extension against get-started-experiments
Options for mitigation
# | option | pros | cons |
---|---|---|---|
1 | Run commands sequentially | locks should no longer be an issue | even slower |
2 | Only rerun failed commands | also mitigates lock issue | involves more complicated logic, possibility of stale data |
3 | Make all 3 commands lockless | allows us to continue to run all commands in parallel | involves work from the CLI team and is only an interim solution, complicates internal of DVC |
4 | Combine commands into single command that the integration can run | limits the amount of data needing to be transferred between the cli and extension, should be faster, cuts out grouped retry logic | more effort required, unsure as to benefit to general users |
5 | Replace CLI calls with event driven architecture | eliminates the need to call the CLI, could serve multiple clients | requires even more work and is not a short or even medium term solution |
6 | Make commands "lightweight" (add --dvc-only ) |
would limit the amount of data being passed and could speed things up | unsure as to the benefit to general users, still requires effort, could still run into lock issues |
My preference would be to start work on 4 as it would actually help us move towards 5.