-
Notifications
You must be signed in to change notification settings - Fork 1.3k
update: handle imported files from non-DVC git repositories #3172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update: handle imported files from non-DVC git repositories #3172
Conversation
tests/func/test_update.py
Outdated
|
|
||
| def test_update_git_tracked(tmp_dir, dvc, erepo_dir): | ||
| with erepo_dir.chdir(): | ||
| erepo_dir.scm.repo.index.remove([".dvc"], r=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the need of removing .dvc from the index itself? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this issue is to check that we can dvc import and then dvc update from a git repository. So I'm purging DVC and committing that. My reasoning is that when the repo is cloned so that it can be imported or updated, its .dvc directory being in the index makes a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using new git_dir dir helper now.
tests/func/test_update.py
Outdated
| with erepo_dir.chdir(): | ||
| erepo_dir.scm.repo.index.remove([".dvc"], r=True) | ||
| shutil.rmtree(".dvc") | ||
| erepo_dir.scm_gen("file", "first version") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could do:
| erepo_dir.scm_gen("file", "first version") | |
| erepo_dir.scm_gen("file", "first version", commit="first version") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that but would still need a commit to remove .dvc in a way that can be cloned as a regular git repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also simplify this?
| erepo_dir.scm_gen("file", "first version") | |
| erepo_dir.scm_gen("file", "first version", commit="first version") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now done through git_dir fixture
tests/func/test_update.py
Outdated
| stage = dvc.imp(fspath(erepo_dir), "file", "file") | ||
|
|
||
| # Just to make sure it doesn't crash | ||
| dvc.update(stage.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would this crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably defensive testing, to check that there's no error even if the file is unchanged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. The purpose here is to increase coverage a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiosantoscode , I think the test could be simpler, without the need of removing files:
def test_update_git_tracked(tmp_dir, dvc, erepo_dir):
with erepo_dir.chdir():
erepo_dir.scm_gen("file", "first version", commit="first")
dvc.imp(fspath(erepo_dir), "file", "file")
assert (tmp_dir / "file").read_text() == "first version"
with erepo_dir.chdir():
erepo_dir.scm_gen("file", "second version", commit="second")
clean_repos()
dvc.update("file.dvc")
assert (tmp_dir / "file").read_text() == "second version"I might be wrong, tho)
would appreciate hearing your comments about it.
skshetry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I agree with @MrOutis regarding test being a bit convoluted (see his example). I tried it locally and works amazingly well. 👍
tests/func/test_update.py
Outdated
|
|
||
| with erepo_dir.chdir(): | ||
| erepo_dir.scm.repo.index.remove(["file"]) | ||
| os.remove("file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to remove file, .scm_gen() overwrites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining this, @skshetry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't know. I'll try that out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
tests/func/test_update.py
Outdated
| stage = dvc.imp(fspath(erepo_dir), "file", "file") | ||
|
|
||
| # Just to make sure it doesn't crash | ||
| dvc.update(stage.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably defensive testing, to check that there's no error even if the file is unchanged?
|
The following is failing for me for some reasons: #! /usr/bin/env sh
set -ex
repo=$(mktemp -d)
upstream_repo=$(mktemp -d)
cd $repo
git init && dvc init
cd $upstream_repo
git init && dvc init
echo "foo" >> foo
git add foo
git commit -m "first version"
cd $repo
dvc import $upstream_repo foo
cd $upstream_repo
echo "foo" >> foo
git commit -am "second version"
cd $repo
dvc update foo.dvc
dvc update foo.dvcwith following error (at the end, i.e last If i do not |
|
@skshetry That is a nice catch! Might be some issues with our caching mechanism, where it tries to auto-create a remote twice here https://github.com/iterative/dvc/blob/master/dvc/external_repo.py#L91 . Haven't looked deeply into it yet though. |
|
@efiop, how about this patch? Should diff --git a/dvc/external_repo.py b/dvc/external_repo.py
index 9ff2f2a4..dd4c30ce 100644
--- a/dvc/external_repo.py
+++ b/dvc/external_repo.py
@@ -92,6 +92,7 @@ def _external_repo(url=None, rev=None, cache_dir=None):
"auto-generated-upstream",
original_repo.cache.local.cache_dir,
default=True,
+ force=True,
level=Config.LEVEL_LOCAL,
)
finally:
|
|
@skshetry Yep, that would work 🙂 But that might be us covering up a bug, will need to look why that happened in the first place. |
|
@efiop want me to have a look in scope of this PR? |
…te-git-tracked-imported-files
|
@skshetry nice catch! I had a quick look and if I change Maybe we can try to add the remote and catch the resulting error instead of doing an |
|
@MrOutis I simplified the test with most of your example, but still need to remove the .dvc directory since this test is for pure-git repositories. DVC repositories are already tested in the same file. Maybe we could have a new fixture |
|
@fabiosantoscode, I guess, the fix is out of the scope of this PR (and, not introduced by the changes). |
|
@skshetry makes sense. I'll stop investigating for now. For the record, if you add this line to the top of DependencyRepo.status() it also fixes the problem:
|
I think, it makes sense to create a git-only repo fixture, as we are supporting more and more non-DVC git repositories. But, for now, as it's only a single test, I guess, you could do the following: from tests.dir_helpers import TmpDir
path = TmpDir()
with path.chdir():
path.scm_gen({"file": "file"}, "first version", commit="first version")This is done here, you could take it as an example: |
|
@skshetry this doesn't work. TmpDir doesn't get an scm by default, the only one that does is edit: I get |
@fabiosantoscode, sorry for misleading. Didn't really try. |
@fabiosantoscode, looked through it, it's fine. Deepsource complains old stuff from the file that we have changed. We do the same |
tests/dir_helpers.py
Outdated
|
|
||
|
|
||
| @pytest.fixture | ||
| def git_dir(tmp_path_factory, monkeypatch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest renaming it so that we know its "external git dir".
Also, if we are already initializing git repository here, how about refactoring erepo_dir so that it uses this fixture? If you consider my point valid, path generation should have logic basing on request.fixturenames (e.g. https://github.com/iterative/dvc/blob/482473aa69fd97ebde8bdd2ae23b0c703158f765/tests/dir_helpers.py#L223)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's very helpful to make this one react to other things. For example, the test test_update_git_tracked which I've added, needs DVC on the tmp_dir, but not on the external git dir. This would make this fixture useless for that test, since it would have DVC anyway.
We could refactor erepo_dir to use this fixture, but that would mean two commits, and the tests are pretty slow as it is right now.
| erepo_dir.scm.add([".dvc", "file.dvc"]) | ||
| erepo_dir.scm.commit("version with dvc") | ||
| new_rev = erepo_dir.scm.get_rev() | ||
| external_git_dir.dvc_gen("file", "second version") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the commit kwarg is equivalent to add + commit.
| external_git_dir.dvc_gen("file", "second version") | |
| external_git_dir.dvc_gen("file", "second version", commit="version with dvc") |
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of introducing a whole new dir_helper for a single test, but this goes far from the scope of this PR. As for now, I can't offer any immediate alternative, so let's roll with what we already have)
Left a small comment that you don't need to address, but take it into account for future contributions
|
I suggest stopping on this until #3124 is resolved, at least Trying to fix it using |
Suor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said above, let's close it. A note on a new fixture stays relevant though.
|
|
||
|
|
||
| @pytest.fixture | ||
| def external_git_dir(tmp_path_factory, monkeypatch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making a new fixture for a single test is an overkill, doing some copy-paste inside is even worse. You may simply use erepo_dir and remove .dvc dir there.
|
Closing in favor of #3190 . Sorry for wasted effort, @fabiosantoscode ! |
Some things are fixed along the way: - no cache shared between incompatible funcs - some incosistent exceptions are now consistent - import updates for git files now works both for dvc and git repos - "auto-generated-upstream already exists" fixed - dvc.api.open()/read() works with git repos now Fixes treeverse#3124, treeverse#2976. Closes treeverse#3172.
This makes
dvc updateupdate files from git repositories. git-tracked imported files were actually already handled.Fixes #2976
❗ Have you followed the guidelines in the Contributing to DVC list?
📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏