Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display path relative to temporary repo when get/import-ing files #2798

Merged
merged 4 commits into from
Nov 20, 2019

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Nov 16, 2019

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • πŸ“– Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addresses. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Description

This PR fixes #2605. This will print the relative path of files being downloaded in the console.

@skshetry skshetry marked this pull request as ready for review November 16, 2019 07:52
@efiop
Copy link
Contributor

efiop commented Nov 16, 2019

Thanks for the PR @skshetry ! Just a heads up: some tests failed, please take a look at travis logs. πŸ™‚

o = self._get_output(wdir=work_dir)
self.assertEqual("path", str(o))

def test_str_if_workdir_outside_repo(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop, this test is failing for SSH/HDFS/s3/GS as they are remote. Others are passing.
Should I just separate this test (and, keep it only for local)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have separated the test for now. Tell me, I feel, I have done something wrong here πŸ˜ƒ

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the three old tests are not tested for other remotes than local.

Copy link
Contributor

@Suor Suor Nov 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering all the test complications I suggest not touching existing ones, but adding a separate pytest-style test to check output stringification for an output in external repo.

Copy link
Member Author

@skshetry skshetry Nov 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the three old tests are not tested for other remotes than local.

They do run, as the above TestCase is not changed at all. Or, did you mean this specific example?

Considering all the test complications I suggest not touching existing ones, but adding a separate pytest-style test to check output stringification for an output in external repo.

You mean, for each remote separately? Please, clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do run, as the above TestCase is not changed at all. Or, did you mean this specific example?

Oh, I see, I was inattentive. And OutputLOCALBase not really used confused me.

You mean, for each remote separately? Please, clarify.

You don't need to do anything for each remote. You are only changing local one and thus only need to add tests for it. You can do that without touching existing tests at all and instead create two new test functions to check the specific changes you introduced.

Copy link
Contributor

@efiop efiop Nov 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry @Suor is saying that you've modified TestOutputLOCAL which is used in tests for other remotes, hence why the tests are failing and why not all tests are now running for those remotes. You need to not touch the old tests, but create a new one pytest-style test to test that stringification works properly for local output when you are in and outside of the repo. πŸ™‚

Comment on lines 45 to 47
root_dir = os.path.join(self.repo.root_dir, "")
curr_dir = os.getcwd()
if curr_dir.startswith(root_dir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is used over and over in our code. This is probably a good time to extract this into some path_isin() utility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

o = self._get_output(wdir=work_dir)
self.assertEqual("path", str(o))

def test_str_if_workdir_outside_repo(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the three old tests are not tested for other remotes than local.

tests/unit/output/test_local.py Outdated Show resolved Hide resolved
stage = Stage(self.dvc)
return self._get_cls()(stage, "path", cache=False)
def _get_output(self, wdir=None, path="path"):
stage = Stage(self.dvc, wdir=wdir if wdir else os.curdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original issue this PR aims to solve was about outputs in external repo and thus outside it. Testing with stage of the same repo might not cover that or might break in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We could have two tests. One with Stage(dvc_repo) to test that it works correctly when in the repo, and one more test with Stage(erepo.dvc) to test that it works when outside of the repo. Should be enough.

o = self._get_output(wdir=work_dir)
self.assertEqual("path", str(o))

def test_str_if_workdir_outside_repo(self):
Copy link
Contributor

@Suor Suor Nov 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering all the test complications I suggest not touching existing ones, but adding a separate pytest-style test to check output stringification for an output in external repo.

@efiop efiop requested review from pared, Suor and a user November 18, 2019 09:59
@@ -21,6 +23,35 @@ def test_save_missing(self):
o.save()


@pytest.mark.usefixtures("dvc_repo")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If yes, then simply using as argument will do the job.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's not necessary. Removing as erepo fixture already creates one.

(os.path.join("path", "to")),
(os.path.join("path", "")),
(os.path.join("path")),
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all those test could be done with one, where parametrize child, path and expected result. But I guess it is a matter of taste.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parametrizing for one line test is just a boiler plate.

Copy link
Member Author

@skshetry skshetry Nov 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that one but I found parameterizing expected_result as well too confusing for myself (and, too much repetition). What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry I am ok with this as it is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer several asserts in a single test. This is mere indirection, no upsides.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer having multiple asserts, but I think, this is easier to follow and understand.

Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, one note, besides that LGTM.

o = self._get_output(wdir=work_dir)
self.assertEqual("path", str(o))

def test_str_if_workdir_outside_repo(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do run, as the above TestCase is not changed at all. Or, did you mean this specific example?

Oh, I see, I was inattentive. And OutputLOCALBase not really used confused me.

You mean, for each remote separately? Please, clarify.

You don't need to do anything for each remote. You are only changing local one and thus only need to add tests for it. You can do that without touching existing tests at all and instead create two new test functions to check the specific changes you introduced.

@@ -398,6 +398,14 @@ def relpath(path, start=os.curdir):
return os.path.relpath(path, start)


def is_child_path(child, parent):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call it path_isin() to be more in line with PathInfo.isin().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed, thanks.

@@ -21,6 +23,35 @@ def test_save_missing(self):
o.save()


@pytest.mark.usefixtures("dvc_repo")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If yes, then simply using as argument will do the job.

(os.path.join("path", "to")),
(os.path.join("path", "")),
(os.path.join("path")),
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parametrizing for one line test is just a boiler plate.

Comment on lines 139 to 153
def test_is_child_path_positive(parent):
child = os.path.join("path", "to", "folder")
assert is_child_path(child, parent)


def test_is_child_on_same_path():
path = os.path.join("path", "to", "folder")
assert not is_child_path(child=path, parent=path)


def test_is_child_on_different_path():
path1 = os.path.join("path", "to", "folder1")
path2 = os.path.join("path", "to", "folder2")

assert not is_child_path(path1, path2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about is_child_path('a/b/', 'a/b')? Comparing only things built with os.path.join() doesn't quite handle it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor I think that os.path.join(path, "") which is inside is_child_path handles this one.

Copy link
Member Author

@skshetry skshetry Nov 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot to test about the case (which fails atm). I'll add that (using os.path.join(), as I am not sure about testing via raw strings.)

From what I understood, you are asking me to also test strings directly? But, that'd not be platform agnostic. That specific string test might need skipping on Windows (to add to that, the implementation is already quite dependent on os.path.join() although it claims it's purely lexical :) ).

Also, should there be a test for demonstrating support for PathInfo? (saw #2137, up-level references might fail on PathInfo, maybe normpath could help.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor, I refactored for PathInfo, as well as added few tests for absolute path and the case you just mentioned.

@skshetry
Copy link
Member Author

I have 12 commits already. I'd prefer rebasing/squashing (or, is it not preferred for some reasons?)

@shcheklein
Copy link
Member

I have 12 commits already. I'd prefer rebasing/squashing (or, is it not preferred for some reasons?)

@pared @MrOutis is the team policy flexible now? @skshetry I think it's totally up to you to manage commits in your branch in any way! And it's of course better for us in terms of merging if they look nice and clean :)

Copy link
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we reached cleanup. So there are many comments below, but they should go easy.

Comment on lines 10 to 11
from dvc.utils import relpath
from dvc.utils import path_isin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't split imports like that anymore.

def path_isin(child, parent):
"""Check if given `child` path is inside `parent`."""
parent = os.path.join(os.path.normpath(parent), "")
child = os.path.join(os.path.normpath(child), "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last os.path.join() is not needed.

Comment on lines 405 to 407
if parent != child and child.startswith(parent):
return True
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return child != parent and ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(os.path.join("path", "to")),
(os.path.join("path", "")),
(os.path.join("path")),
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer several asserts in a single test. This is mere indirection, no upsides.

Comment on lines 148 to 151
assert not path_isin(child=path, parent=path)
assert not path_isin(child=path, parent=path2)
assert not path_isin(child=path2, parent=path)
assert not path_isin(child=path2, parent=path2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are using pass by name here? I suggest treating those args as positional only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought, it'd be less confusing here if I added the keywords. Removing ...

assert not path_isin(child=path2, parent=path2)


def test_path_isin_on_different_path():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is about substring not some completely different path, test name is inpresize.

dvc/output/local.py Outdated Show resolved Hide resolved
assert not path_isin(path1, path2)


def test_path_isin_accepts_pathinfo():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it won't work in Python 3.5 and earlier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, didn't test on Python3.5. Fixing ...

@@ -398,6 +398,15 @@ def relpath(path, start=os.curdir):
return os.path.relpath(path, start)


def path_isin(child, parent):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to dvc.utils.fs please.

Copy link
Member Author

@skshetry skshetry Nov 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought, dvc.utils.fs were mostly related to low-level like inodes and stuffs. Moving ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor, I have moved it to dvc.utils.fs and the tests as well.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No additional comments πŸ‘

@skshetry skshetry requested a review from Suor November 18, 2019 23:53
@skshetry
Copy link
Member Author

@Suor, I have rebased, please review again.

@efiop efiop requested a review from pared November 19, 2019 06:57
dvc/utils/fs.py Outdated
"""Check if given `child` path is inside `parent`."""

def normalize_path(path):
return os.path.normpath(fspath(path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fspath_py35() instead of fspath() here, since in Python 3.6+ you may pass Path-like object to os utils directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, thanks.

tests/unit/utils/test_fs.py Outdated Show resolved Hide resolved

def test_path_isin_on_same_path():
path = os.path.join("path", "to", "folder")
path2 = os.path.join(path, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path2 = os.path.join(path, "")
path_with_sep = os.path.join(path, "")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@skshetry skshetry requested a review from Suor November 19, 2019 16:13
@efiop
Copy link
Contributor

efiop commented Nov 19, 2019

@skshetry Please check tests, they are failing.

@efiop
Copy link
Contributor

efiop commented Nov 19, 2019

Ah, wait, those failures are probably unrelated. Looking into it...

@efiop
Copy link
Contributor

efiop commented Nov 19, 2019

Fixed the tests on master, everything should be smooth now. Sorry for the inconvenience πŸ™‚

@pared
Copy link
Contributor

pared commented Nov 20, 2019

When this patch is merged, we should create ticket for replacing startswith with path_isin throughout the code.

@efiop efiop requested a review from a user November 20, 2019 09:03
@efiop
Copy link
Contributor

efiop commented Nov 20, 2019

@Suor @MrOutis Guys, please take a look.

Copy link
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@efiop
Copy link
Contributor

efiop commented Nov 20, 2019

Ah, I see @MrOutis already approved.

@efiop
Copy link
Contributor

efiop commented Nov 20, 2019

Thanks @skshetry ! πŸŽ‰

@efiop efiop merged commit 91ac75d into iterative:master Nov 20, 2019
@skshetry skshetry deleted the fix-str-rel-repo branch November 20, 2019 09:47
@efiop
Copy link
Contributor

efiop commented Nov 20, 2019

@skshetry Please don't forget to create a ticket for #2798 (comment) . πŸ™‚

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dvc get/import shows temporary paths
5 participants