Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Story: Add basic support for files tracked by DVC #137

Closed
3 of 4 tasks
mattseddon opened this issue Mar 7, 2021 · 14 comments
Closed
3 of 4 tasks

Story: Add basic support for files tracked by DVC #137

mattseddon opened this issue Mar 7, 2021 · 14 comments
Labels
priority-p1 Regular product backlog research story Product feature aka epic. Discussion, progress, checkboxes for implementation, etc

Comments

@mattseddon
Copy link
Member

mattseddon commented Mar 7, 2021

We want to improve the user experience with DVC tracked files.

The ideal end goal is to have a seamless / native VS code experience for all files that are tracked by DVC. Currently DVC-tracked files are not visible in the Git source control tree. They are also generally present in the .gitignore file which means they are greyed out in the Explorer tab. Users also need to use and be familiar with the command line / terminal to do basic operations, see status, etc. Here is a screenshot of the Explorer panel for our demo project:

image

From this doc it appears that there is an API which we can use to show a tree view in the LHS of the IDE along with other Explorer panels. We would be able to add something like this (see npm-scripts):

image

We could also add to the source control panel. I think this API would let us do that. Here is what it would look like:

image

We could also assign actions to the tracked files via the source control API. That would look something like this:

image

On the CLI side we have dvc list (docs here) to help us out. Potentially there will be some performance issues with very large directories so please keep that in mind when formulating an approach.

Steps/features:

  • Decide on the appropriate tree (or combination of trees) to view DVC-tracked files
  • Implement the agreed approach
  • Assign actions per DVC metafiles/DVC-tracked files- pull/push/etc
  • Handle directories that have many, many files (> 1M)

@shcheklein please let me know if I've missed anything.

@mattseddon mattseddon added story Product feature aka epic. Discussion, progress, checkboxes for implementation, etc priority-p1 Regular product backlog research labels Mar 7, 2021
@mattseddon
Copy link
Member Author

@itsmesean there will be some research involved up front. Please let me know if anything is unclear or if you need help with anything. Happy to answer any questions that you have.

@itsmesean
Copy link

Currently leaning on implementing a tree view within the file explorer before going the scm route. What I have could also be place within the current dvc-container if we decide to go that route.

The file tree is pretty standard aside from the way it gets the initial directory and subdirectories to parse.
Using cp.exec to call dvc list ., then capturing the output string and splitting it on \r\n(windows for now, will be sys agnostic later), we get an array of all the top-level dvc tracked files.
You could add -R and get everything, but on my 9900k this takes ~5 seconds, and I imagine real world use cases will have far more data than the demo, making that approach very impractical.

1

Adding commands in the form of buttons/context menu items is very doable from here.
Implementing a file system watcher may be a bit more involved.

If anyone notices a fatal flaw in this approach feel free to enlighten me, there is so much about dvc I am unware of.

@itsmesean
Copy link

@mattseddon

Should we default to only listing dvc tracked items from dvc list . --dvc-only, or dvc list . which returns files tracked by dvc and git?

@mattseddon
Copy link
Member Author

@itsmesean

Adding commands in the form of buttons/context menu items is very doable from here.

What will the workflow look like for a user? How would they dvc add a file/folder? Will that then show the file/folder in the tree view? What options will they then have from there?

Should we default to only listing dvc tracked items from dvc list . --dvc-only, or dvc list . which returns files tracked by dvc and git?

I would stick to only DVC tracked files for now but should be able to answer better with your first answer.

@itsmesean
Copy link

What will the workflow look like for a user? How would they dvc add a file/folder? Will that then show the file/folder in the tree view? What options will they then have from there?

The view could be split in to tracked and untracked

Items in the untracked view would have a inline button on hover that executes any dvc command, like add, which then moves that file in to the tracked view.
s
That dvc icon is a stand in for a more appropriate icon

There can be multiple buttons per item, with different render conditions, triggering different commands that pass data such as the items uri to the function that executes the command.
The views themselves can have buttons/icons that trigger actions as well.

@itsmesean
Copy link

@rogermparent curious if you have a preference towards any of the potential approaches, or have insight to what a user would find most useful.

@itsmesean
Copy link

@mattseddon @rogermparent @shcheklein

Decide on the appropriate tree (or combination of trees) to view DVC-tracked files

The value of a fully featured scm provider for DVC in vscode is pretty clear.

With that said, if there is value to be had in another treeview, either in the explorer view or the DVC container view, what could it be?
What features could they contribute that wouldn't overlap with what the scm tree already provides?

@rogermparent
Copy link
Contributor

rogermparent commented Mar 11, 2021

Sorry for the late response, I started one and got sidetracked part-way through

Since DVC CLI is, from what I understand, designed to be similar to Git, it stands to reason the SCM API is a perfect fit for most of DVC's per-file functionality.

For repo-level actions like push and pull, our first implementation was using a Tree View, but that was just because its stub was conveniently editable in the project's bootstrap. The Git plugin uses the context menu (the ... in the top right of the SCM title) for this purpose, and we probably should too.

The SCM API and the menu seem to be able to encompass all the functionality of DVC we'd want in this extension for now, so it may actually be best to drop our current treeview and switch to an SCM view. I still need to review that API, I'll report back after I do.

@mattseddon
Copy link
Member Author

mattseddon commented Mar 11, 2021

I think having an explorer tree view of all of the folders / files tracked by DVC would be valuable but a secondary concern and not a priority.

It would help users see (at a glance) all of the files that are currently tracked by DVC. Potentially users could run commands on unchanged files that aren't show in the SCM view (dvc pull being the only one that makes sense to me right now) and we would be able to expand that in the future.

@itsmesean
Copy link

Would dvc pull be available to run on individual directories and files as well as every currently tracked dvc item all together?

@mattseddon
Copy link
Member Author

Would dvc pull be available to run on individual directories and files as well as every currently tracked dvc item all together?

I believe so (from here):

usage: dvc pull [-h] [-q | -v] [-j <number>] [-r <name>] [-a] [-T]
                [-d] [-f] [-R] [--glob] [--all-commits] [--run-cache]
                [targets [targets ...]]

positional arguments:
  targets       Limit command scope to these tracked files/directories,
                .dvc files, or stage names.

@itsmesean
Copy link

@yalozhkin do you have an svg icon I can use to represent 'push'. I'm looking through dvc.org now, but thought id ask incase you already know of one.

image

@itsmesean itsmesean linked a pull request Mar 13, 2021 that will close this issue
@itsmesean
Copy link

Here's a quick demo of the scm context menu. We could have a submenu item for every possible flag, but this will get cluttered fast. If you can think of other important commands to have here besides those listed in #40 I can add them.

settings

@mattseddon
Copy link
Member Author

Covered by #176 and #169.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-p1 Regular product backlog research story Product feature aka epic. Discussion, progress, checkboxes for implementation, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants