Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote: should DVC prevent external cache overlap default remote? #3703

Closed
jorgeorpinel opened this issue Apr 29, 2020 · 2 comments
Closed
Labels
bug Did we break something? discussion requires active participation to reach a conclusion

Comments

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Apr 29, 2020

A possible misunderstanding from https://dvc.org/doc/user-guide/managing-external-data#examples is that you can have an actual remote storage be the external cache for your project. So if you don't notice that there are path/directories in those examples, you may end up setting up a remote, even set as default, in the root of an S3 bucket or other cloud storage, and then setup that remote as the external cache also.

This would be a contradicting setup an I'm not even sure if it could cause problems for DVC commands (for example dvc push would never push anything as it's already stored there). Basically you would not have a backup of the project cache because of a bad setup.

The question or suggestion is: should DVC detect this situation and prevent users from having such a config? If so, it's a bug as it's possible to do this now.

Context: https://discordapp.com/channels/485586884165107732/485596304961962003/704739550483710032


Also consider updating command references and/or the external data guide to mention how certain commands that deal with remote data work with external outputs, for example dvc add uses eTag instead of md5 has I believe; get/import don't work with external data, etc.

@jorgeorpinel jorgeorpinel added bug Did we break something? question I have a question? labels Apr 29, 2020
@jorgeorpinel
Copy link
Contributor Author

More context (from https://discordapp.com/channels/485586884165107732/485596304961962003/705308604257009766):

The reason I am using same remote location for external cache and remote is because I am trying to avoid creating multiple copies of data.

Our responses to the user:

That won't help, because we use etags on s3, which might not match with the md5-ish hash that dvc uses locally.
You don't have to use a remote at all :slight_smile: That's an optional feature!

@efiop
Copy link
Contributor

efiop commented May 3, 2021

Closing in favor of #3920 . It is not worth investing more time into this scenario until we reconsider it.

@efiop efiop closed this as completed May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? discussion requires active participation to reach a conclusion
Projects
None yet
Development

No branches or pull requests

2 participants