Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: from external FS or local external outputs #1545

Closed
dmpetrov opened this issue Jan 29, 2019 · 14 comments · Fixed by #3929
Closed

add: from external FS or local external outputs #1545

dmpetrov opened this issue Jan 29, 2019 · 14 comments · Fixed by #3929
Assignees
Labels
enhancement Enhances DVC p0-critical Critical issue. Needs to be fixed ASAP. ui user interface / interaction

Comments

@dmpetrov
Copy link
Member

$ dvc add /Volumes/Seagate/storage/data/reddit-May2015_z.tsv.zip

It copies the file into the cache, removes the original file in the external drive and creates symlink instead (to the dvc cache).

Based on some offline discussions it confuses people. One of the biggest concerns - after removing DVC project (the entire dir with the code as well as DVC-cache) user sees a symlink to a file which does not exist.

  1. Why should DVC work with external files? Is there any meaningful scenario? If not - let's throw an error when adding external files.
  2. Even if we have a scenario for (1) let's not touch\remove external files?
  3. As a result of the command, I have reddit-May2015_z.tsv.zip.dvc file but not the actual file in the workspace (like reddit-May2015_z.tsv.zip). It is not clear how to instantiate the file in the workspace?
@dmpetrov dmpetrov added the bug Did we break something? label Jan 29, 2019
@efiop
Copy link
Contributor

efiop commented Jan 29, 2019

This is just a usual "external output" case, where you want to extend your project's workspace with the workspace of your entire machine. I don't think it is a bug, but rather a miscommunication. I guess people expect dvc add to copy a file to their project?

@dmpetrov
Copy link
Member Author

dmpetrov commented Jan 29, 2019

The guy saw dvc add command and interpreted that as adding a file under DVC control. The dataset was deleted from the external drive after the project was deleted.

I expect this dvc add external-file scenario is quite common and we should do something about this. The external output scenario might be a good one but we should think more carefully about this scenario and how it intersects with the basic scenario. I'll think about the intersection and I'll try to propose some solution.

@efiop
Copy link
Contributor

efiop commented Jan 29, 2019

@dmpetrov I see, this is the guy from ODS you are talking about. Agreed. I think we could simply add a confirmation prompt for external local files along with something like --external flag to suppress it. Will take a closer look at this later.

@dmpetrov
Copy link
Member Author

Yeah, that's what I was thinking about. But let's think a bit more about this experience.

@efiop efiop added enhancement Enhances DVC and removed bug Did we break something? labels Feb 6, 2019
@efiop
Copy link
Contributor

efiop commented May 18, 2019

Or, as discussed with @shcheklein , we could require a special external cache setup for external outputs, same as we do with s3/gs/etc. That way it would error-out automatically, without us having to introduce --external flag, that is breaking the symmetry for s3/gs/etc external outputs.

@shcheklein
Copy link
Member

More context on this - https://discuss.dvc.org/t/share-nas-data-in-server/180 . It looks it's a really common thing and the way our "external outputs" behave in this case is very confusing.

@efiop efiop added the p2-medium Medium priority, should be done, but less important label May 18, 2019
@efiop efiop added the ui user interface / interaction label Jul 23, 2019
@dmpetrov dmpetrov changed the title add: from external FS add: from external FS or local external outputs Jan 19, 2020
@dmpetrov
Copy link
Member Author

dmpetrov commented Jan 19, 2020

The scenario with external outputs makes seen but it is too advanced for dvc add and it confuses people and especially beginners. This is a usual error in the first-day experience no matter what is the user's background - hardcore users can easily make this mistake.

Let's prevent using external outputs in dvc add. Advanced users can use dvc run -o ~/Download/myfile (no command) instead of dvc add ~/Download/myfile.

@dmpetrov
Copy link
Member Author

dmpetrov commented Jan 19, 2020

It is also related to #2955. DVC needs to resolve paths, exits with an error if the path is outside the current repo (no external outputs). The paths inside the repo need to be converted to relevant paths.

@efiop efiop added p0-critical Critical issue. Needs to be fixed ASAP. p2-medium Medium priority, should be done, but less important and removed p2-medium Medium priority, should be done, but less important labels May 21, 2020
@efiop efiop self-assigned this May 21, 2020
@skshetry skshetry removed the p2-medium Medium priority, should be done, but less important label May 25, 2020
@efiop
Copy link
Contributor

efiop commented May 29, 2020

Had a great discussion with @dmpetrov and @shcheklein that it might make sense to introduce a config option instead of --external for now, as it will be much less annoying to the advanced users.

But a proper solution here, is probably to introduce a notion of workspaces, that will enforce proper use. E.g. maybe something like

dvc workspace add mys3 s3://bucket/ruslan  # this might even automatically create .dvc/cache, similar to local workspace
dvc add s3://bucket/ruslan/path # will understand that the file is within the legal workspace

but it should carefully evaluated. We will do that later. For now we need to prevent new users from shooting themselves in the leg.

efiop added a commit to efiop/dvc that referenced this issue Jun 3, 2020
efiop added a commit that referenced this issue Jun 3, 2020
@jorgeorpinel
Copy link
Contributor

I have some questions about the new --external option (noticed in iterative/dvc.org#1387 (review)):

  • Does dvc add <external_path> (without --external) error out now? (With a message indicating you need to use that option explicitly, I would expect)
  • Since it was also introduced to dvc run, why don't external dependencies need the same option? (Seems confusing UI-wise)

@efiop
Copy link
Contributor

efiop commented Jun 22, 2020

Does dvc add <external_path> (without --external) error out now? (With a message indicating you need to use that option explicitly, I would expect)

@jorgeorpinel Yep, https://github.com/iterative/dvc/pull/3929/files#diff-43298ed5b9e6a1345c43c8aedf6971c3R92

Since it was also introduced to dvc run, why don't external dependencies need the same option? (Seems confusing UI-wise)

Because using them is safe and we haven't seen people misusing it.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jun 23, 2020

OK. But --external seems like it could apply or affect any external stuff (both deps and outs). Should the option be --enable-external-outs or something shorter like --x-outs ?

@efiop
Copy link
Contributor

efiop commented Jun 23, 2020

@jorgeorpinel Agreed, the name could be made more explicit, but --external was added to simply prevent the misuse, it is only meant for advanced users. The external outs scenario is going to be revisited in the future, so I wouldn't bother renaming it again until then (at which point we will likely get rid of it).

@jorgeorpinel
Copy link
Contributor

Sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC p0-critical Critical issue. Needs to be fixed ASAP. ui user interface / interaction
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants