-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display path relative to temporary repo when get/import-ing files #2798
Conversation
b23a015
to
a836705
Compare
Thanks for the PR @skshetry ! Just a heads up: some tests failed, please take a look at travis logs. π |
tests/unit/output/test_local.py
Outdated
o = self._get_output(wdir=work_dir) | ||
self.assertEqual("path", str(o)) | ||
|
||
def test_str_if_workdir_outside_repo(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efiop, this test is failing for SSH
/HDFS
/s3
/GS
as they are remote. Others are passing.
Should I just separate this test (and, keep it only for local)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have separated the test for now. Tell me, I feel, I have done something wrong here π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the three old tests are not tested for other remotes than local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering all the test complications I suggest not touching existing ones, but adding a separate pytest-style test to check output stringification for an output in external repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the three old tests are not tested for other remotes than local.
They do run, as the above TestCase
is not changed at all. Or, did you mean this specific example?
Considering all the test complications I suggest not touching existing ones, but adding a separate pytest-style test to check output stringification for an output in external repo.
You mean, for each remote separately? Please, clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do run, as the above TestCase is not changed at all. Or, did you mean this specific example?
Oh, I see, I was inattentive. And OutputLOCALBase
not really used confused me.
You mean, for each remote separately? Please, clarify.
You don't need to do anything for each remote. You are only changing local one and thus only need to add tests for it. You can do that without touching existing tests at all and instead create two new test functions to check the specific changes you introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry @Suor is saying that you've modified TestOutputLOCAL which is used in tests for other remotes, hence why the tests are failing and why not all tests are now running for those remotes. You need to not touch the old tests, but create a new one pytest-style test to test that stringification works properly for local output when you are in and outside of the repo. π
dvc/output/local.py
Outdated
root_dir = os.path.join(self.repo.root_dir, "") | ||
curr_dir = os.getcwd() | ||
if curr_dir.startswith(root_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is used over and over in our code. This is probably a good time to extract this into some path_isin()
utility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
tests/unit/output/test_local.py
Outdated
o = self._get_output(wdir=work_dir) | ||
self.assertEqual("path", str(o)) | ||
|
||
def test_str_if_workdir_outside_repo(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the three old tests are not tested for other remotes than local.
tests/unit/output/test_local.py
Outdated
stage = Stage(self.dvc) | ||
return self._get_cls()(stage, "path", cache=False) | ||
def _get_output(self, wdir=None, path="path"): | ||
stage = Stage(self.dvc, wdir=wdir if wdir else os.curdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original issue this PR aims to solve was about outputs in external repo and thus outside it. Testing with stage of the same repo might not cover that or might break in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We could have two tests. One with Stage(dvc_repo)
to test that it works correctly when in the repo, and one more test with Stage(erepo.dvc)
to test that it works when outside of the repo. Should be enough.
tests/unit/output/test_local.py
Outdated
o = self._get_output(wdir=work_dir) | ||
self.assertEqual("path", str(o)) | ||
|
||
def test_str_if_workdir_outside_repo(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering all the test complications I suggest not touching existing ones, but adding a separate pytest-style test to check output stringification for an output in external repo.
tests/unit/output/test_local.py
Outdated
@@ -21,6 +23,35 @@ def test_save_missing(self): | |||
o.save() | |||
|
|||
|
|||
@pytest.mark.usefixtures("dvc_repo") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If yes, then simply using as argument will do the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's not necessary. Removing as erepo
fixture already creates one.
tests/unit/utils/test_utils.py
Outdated
(os.path.join("path", "to")), | ||
(os.path.join("path", "")), | ||
(os.path.join("path")), | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all those test could be done with one, where parametrize child, path and expected result. But I guess it is a matter of taste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parametrizing for one line test is just a boiler plate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that one but I found parameterizing expected_result
as well too confusing for myself (and, too much repetition). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry I am ok with this as it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still prefer several asserts in a single test. This is mere indirection, no upsides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also prefer having multiple asserts, but I think, this is easier to follow and understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, one note, besides that LGTM.
tests/unit/output/test_local.py
Outdated
o = self._get_output(wdir=work_dir) | ||
self.assertEqual("path", str(o)) | ||
|
||
def test_str_if_workdir_outside_repo(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do run, as the above TestCase is not changed at all. Or, did you mean this specific example?
Oh, I see, I was inattentive. And OutputLOCALBase
not really used confused me.
You mean, for each remote separately? Please, clarify.
You don't need to do anything for each remote. You are only changing local one and thus only need to add tests for it. You can do that without touching existing tests at all and instead create two new test functions to check the specific changes you introduced.
dvc/utils/__init__.py
Outdated
@@ -398,6 +398,14 @@ def relpath(path, start=os.curdir): | |||
return os.path.relpath(path, start) | |||
|
|||
|
|||
def is_child_path(child, parent): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it path_isin()
to be more in line with PathInfo.isin()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed, thanks.
tests/unit/output/test_local.py
Outdated
@@ -21,6 +23,35 @@ def test_save_missing(self): | |||
o.save() | |||
|
|||
|
|||
@pytest.mark.usefixtures("dvc_repo") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If yes, then simply using as argument will do the job.
tests/unit/utils/test_utils.py
Outdated
(os.path.join("path", "to")), | ||
(os.path.join("path", "")), | ||
(os.path.join("path")), | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parametrizing for one line test is just a boiler plate.
tests/unit/utils/test_utils.py
Outdated
def test_is_child_path_positive(parent): | ||
child = os.path.join("path", "to", "folder") | ||
assert is_child_path(child, parent) | ||
|
||
|
||
def test_is_child_on_same_path(): | ||
path = os.path.join("path", "to", "folder") | ||
assert not is_child_path(child=path, parent=path) | ||
|
||
|
||
def test_is_child_on_different_path(): | ||
path1 = os.path.join("path", "to", "folder1") | ||
path2 = os.path.join("path", "to", "folder2") | ||
|
||
assert not is_child_path(path1, path2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about is_child_path('a/b/', 'a/b')
? Comparing only things built with os.path.join()
doesn't quite handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor I think that os.path.join(path, "")
which is inside is_child_path
handles this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I forgot to test about the case (which fails atm). I'll add that (using os.path.join()
, as I am not sure about testing via raw strings.)
From what I understood, you are asking me to also test strings directly? But, that'd not be platform agnostic. That specific string test might need skipping on Windows
(to add to that, the implementation is already quite dependent on os.path.join()
although it claims it's purely lexical :) ).
Also, should there be a test for demonstrating support for PathInfo
? (saw #2137, up-level
references might fail on PathInfo
, maybe normpath
could help.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor, I refactored for PathInfo
, as well as added few tests for absolute path and the case you just mentioned.
I have 12 commits already. I'd prefer rebasing/squashing (or, is it not preferred for some reasons?) |
@pared @MrOutis is the team policy flexible now? @skshetry I think it's totally up to you to manage commits in your branch in any way! And it's of course better for us in terms of merging if they look nice and clean :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we reached cleanup. So there are many comments below, but they should go easy.
dvc/output/local.py
Outdated
from dvc.utils import relpath | ||
from dvc.utils import path_isin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't split imports like that anymore.
dvc/utils/__init__.py
Outdated
def path_isin(child, parent): | ||
"""Check if given `child` path is inside `parent`.""" | ||
parent = os.path.join(os.path.normpath(parent), "") | ||
child = os.path.join(os.path.normpath(child), "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last os.path.join()
is not needed.
dvc/utils/__init__.py
Outdated
if parent != child and child.startswith(parent): | ||
return True | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return child != parent and ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/unit/utils/test_utils.py
Outdated
(os.path.join("path", "to")), | ||
(os.path.join("path", "")), | ||
(os.path.join("path")), | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still prefer several asserts in a single test. This is mere indirection, no upsides.
tests/unit/utils/test_utils.py
Outdated
assert not path_isin(child=path, parent=path) | ||
assert not path_isin(child=path, parent=path2) | ||
assert not path_isin(child=path2, parent=path) | ||
assert not path_isin(child=path2, parent=path2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are using pass by name here? I suggest treating those args as positional only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought, it'd be less confusing here if I added the keywords. Removing ...
tests/unit/utils/test_utils.py
Outdated
assert not path_isin(child=path2, parent=path2) | ||
|
||
|
||
def test_path_isin_on_different_path(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test is about substring not some completely different path, test name is inpresize.
tests/unit/utils/test_utils.py
Outdated
assert not path_isin(path1, path2) | ||
|
||
|
||
def test_path_isin_accepts_pathinfo(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it won't work in Python 3.5 and earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, didn't test on Python3.5
. Fixing ...
dvc/utils/__init__.py
Outdated
@@ -398,6 +398,15 @@ def relpath(path, start=os.curdir): | |||
return os.path.relpath(path, start) | |||
|
|||
|
|||
def path_isin(child, parent): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to dvc.utils.fs
please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought, dvc.utils.fs
were mostly related to low-level like inodes and stuffs. Moving ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor, I have moved it to dvc.utils.fs
and the tests as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No additional comments π
704c420
to
f07a70a
Compare
@Suor, I have rebased, please review again. |
dvc/utils/fs.py
Outdated
"""Check if given `child` path is inside `parent`.""" | ||
|
||
def normalize_path(path): | ||
return os.path.normpath(fspath(path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fspath_py35()
instead of fspath()
here, since in Python 3.6+ you may pass Path-like object to os
utils directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, thanks.
tests/unit/utils/test_fs.py
Outdated
|
||
def test_path_isin_on_same_path(): | ||
path = os.path.join("path", "to", "folder") | ||
path2 = os.path.join(path, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path2 = os.path.join(path, "") | |
path_with_sep = os.path.join(path, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@skshetry Please check tests, they are failing. |
Ah, wait, those failures are probably unrelated. Looking into it... |
Fixed the tests on master, everything should be smooth now. Sorry for the inconvenience π |
When this patch is merged, we should create ticket for replacing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Ah, I see @MrOutis already approved. |
Thanks @skshetry ! π |
@skshetry Please don't forget to create a ticket for #2798 (comment) . π |
β Have you followed the guidelines in the Contributing to DVC list?
π Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
β Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addresses. Please review them carefully and fix those that actually improve code or fix bugs.
Thank you for the contribution - we'll try to review it as soon as possible. π
Description
This PR fixes #2605. This will print the relative path of files being downloaded in the console.