You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A possible misunderstanding from https://dvc.org/doc/user-guide/managing-external-data#examples is that you can have an actual remote storage be the external cache for your project. So if you don't notice that there are path/directories in those examples, you may end up setting up a remote, even set as default, in the root of an S3 bucket or other cloud storage, and then setup that remote as the external cache also.
This would be a contradicting setup an I'm not even sure if it could cause problems for DVC commands (for example dvc push would never push anything as it's already stored there). Basically you would not have a backup of the project cache because of a bad setup.
The question or suggestion is: should DVC detect this situation and prevent users from having such a config? If so, it's a bug as it's possible to do this now.
Also consider updating command references and/or the external data guide to mention how certain commands that deal with remote data work with external outputs, for example dvc add uses eTag instead of md5 has I believe; get/import don't work with external data, etc.
The text was updated successfully, but these errors were encountered:
The reason I am using same remote location for external cache and remote is because I am trying to avoid creating multiple copies of data.
Our responses to the user:
That won't help, because we use etags on s3, which might not match with the md5-ish hash that dvc uses locally.
You don't have to use a remote at all :slight_smile: That's an optional feature!
A possible misunderstanding from https://dvc.org/doc/user-guide/managing-external-data#examples is that you can have an actual remote storage be the external cache for your project. So if you don't notice that there are path/directories in those examples, you may end up setting up a remote, even set as default, in the root of an S3 bucket or other cloud storage, and then setup that remote as the external cache also.
This would be a contradicting setup an I'm not even sure if it could cause problems for DVC commands (for example
dvc push
would never push anything as it's already stored there). Basically you would not have a backup of the project cache because of a bad setup.The question or suggestion is: should DVC detect this situation and prevent users from having such a config? If so, it's a bug as it's possible to do this now.
Also consider updating command references and/or the external data guide to mention how certain commands that deal with remote data work with external outputs, for example dvc add uses eTag instead of md5 has I believe; get/import don't work with external data, etc.
The text was updated successfully, but these errors were encountered: