Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote with relpath breaks get #2756

Closed
pared opened this issue Nov 8, 2019 · 8 comments
Closed

remote with relpath breaks get #2756

pared opened this issue Nov 8, 2019 · 8 comments
Labels
enhancement Enhances DVC p2-medium Medium priority, should be done, but less important ui user interface / interaction

Comments

@pared
Copy link
Contributor

pared commented Nov 8, 2019

#!/bin/bash

rm -rf repo remote_repo storage git_repo
mkdir repo remote_repo storage git_repo

pushd remote_repo
git init >> /dev/null && dvc init -q

set -x
set -e
echo data >> data
# dvc remote add -d str ../storage
dvc remote add -d str /full/path/to/storage
dvc add data
git add .dvc/config data.dvc .gitignore
git commit -m "add data"
dvc push
rm data
rm -rf .dvc/cache

popd
pushd repo

git init >> /dev/null && dvc init -q
dvc get ../remote_repo data

Will work just fine, but when we will use relpath to point to storage (as in commented line)
we will get warnings that cache files do not exist and data will not be created.

It is happening, because, in external_repo we clone url to /tmp and therefore, relative path to cache cannot be resolved properly.

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Nov 8, 2019
@efiop efiop added the ui user interface / interaction label Nov 8, 2019
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Nov 8, 2019
@efiop efiop added enhancement Enhances DVC p2-medium Medium priority, should be done, but less important labels Nov 8, 2019
@pared
Copy link
Contributor Author

pared commented Feb 19, 2020

The solution could be to make ExternalRepo acknowledge local remotes with a relative path and make it resolve the paths appropriately. We do something similar already in ExternalRepo._set_upstream when automatically setting up the remote if none is present.

@tizoc
Copy link
Contributor

tizoc commented Feb 20, 2020

Hello, I will take a look into this /cc @efiop

@efiop
Copy link
Contributor

efiop commented Feb 20, 2020

@tizoc Sounds great! Please let us know if you'll have any questions :)

@tizoc
Copy link
Contributor

tizoc commented Feb 20, 2020

Will do!

So far what I have found out is that (unlike git) dvc is making non-absolute paths in the config relative to the .dvc directory, not the root of the repository:

$ dvc remote add -d str ../storage
Setting 'str' as a default remote.
$ cat .dvc/config
[core]
    remote = str
['remote "str"']
    url = ../../storage

git:

$ git remote add str ../storage
$ cat .git/config 
[remote "str"]
        url = ../storage
        fetch = +refs/heads/*:refs/remotes/str/*

If I manually change the path to be relative to the root of the repository the dvc get ../remote_repo data command works.

So, before going forward, my question is: Is this (paths relative to the .dvc directory instead of the root) the expected behavior?

@efiop
Copy link
Contributor

efiop commented Feb 20, 2020

@tizoc We actually transform it relative to the config file location, which by default is .dvc/config. Git does indeed treat it literally, i don't think it will work after that though, unless you are in repo root.

So yes, it is expected behaviour. In get, when we clone external repo, we need to resolve that remote path relative to the original location of that repo. Please see _set_upstream in dvc/external_repo.py as noted above, it does a similar thing but for cache.

@tizoc
Copy link
Contributor

tizoc commented Feb 21, 2020

Got it! after stepping the code with the debugger I see the issue now, this is what the url for the remote entry looks like for the temporary cloned repo:

'url': '/private/var/folders/sy/4w247kq176z5n2k37nn3lblw0000gn/T/tmp5_g92ir8dvc-erepo/.dvc/../../storage'

Thanks for the pointer.

tizoc added a commit to tizoc/dvc that referenced this issue Feb 21, 2020
@tizoc
Copy link
Contributor

tizoc commented Feb 21, 2020

Fixed in #3378

Still have to write a func test, and probably split that function into two.

tizoc added a commit to tizoc/dvc that referenced this issue Feb 21, 2020
tizoc added a commit to tizoc/dvc that referenced this issue Feb 24, 2020
- use `tmp_dir` fixture
- commit config file changes
tizoc added a commit to tizoc/dvc that referenced this issue Feb 24, 2020
tizoc added a commit to tizoc/dvc that referenced this issue Feb 26, 2020
efiop pushed a commit that referenced this issue Feb 26, 2020
* erepo: fix relative urls for remotes (#2756)

* erepo: add test case for relative path remotes (#2756)

* Remove unused argument

* erepo: move handling of relative remotes to a separate method (#2756)

* erepo: don't manually build paths on test (#2756)

* erepo: refactor handling of missing or relative upstream (#2756)

* erepo: move handling of source repo resource to `_fix_upstream` (#2756)

* erepo: improve relative remote tests (#2756)

- use `tmp_dir` fixture
- commit config file changes

* erepo: Remove comment (#2756)

* erepo: use scm_add to commit too, no need for separate scm.commit call (#2756)

* erepo: minor refactoring in `_fix_upstream` (#2756)

* erepo: add comment explaining that old relative paths are restored (#2756)
@tizoc
Copy link
Contributor

tizoc commented Feb 26, 2020

This should be solved now that #3378 was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC p2-medium Medium priority, should be done, but less important ui user interface / interaction
Projects
None yet
Development

No branches or pull requests

3 participants