Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify if get_mtime_and_size can take both Path objects and strings #2698

Merged
merged 4 commits into from
Nov 2, 2019
Merged

Verify if get_mtime_and_size can take both Path objects and strings #2698

merged 4 commits into from
Nov 2, 2019

Conversation

algomaster99
Copy link
Contributor

  • Have you followed the guidelines in our
    Contributing document?

  • Does your PR affect documented changes or does it add new functionality
    that should be documented? If yes, have you created a PR for
    dvc.org documenting it or at
    least opened an issue for it? If so, please add a link to it.


With reference to #2137 and #2673 (comment)

@algomaster99 algomaster99 requested a review from efiop October 30, 2019 19:04
dvc/state.py Outdated Show resolved Hide resolved
@algomaster99 algomaster99 requested a review from efiop October 31, 2019 10:58
@efiop
Copy link
Contributor

efiop commented Nov 1, 2019

@algomaster99 Please keep an eye on tests, they will tell you if anything is very wrong before I am able to come in and take a look 🙂

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are failing because get_mtime_and_size is not converting things properly right now. For example https://github.com/iterative/dvc/blob/0.66.2/dvc/utils/fs.py#L24 https://github.com/iterative/dvc/blob/0.66.2/dvc/utils/fs.py#L42 etc.

Copy link
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before verifying one needs to actually change get_mtime_and_size() to handle PathInfo arguments. The agreed way to do this is wrapping path args with fspath() unconditionally, see how this is done in dvc.system.System methods.

Comment on lines 42 to 49
def test_path_object_and_str_are_valid_types(self):
dvcignore = DvcIgnoreFilter(self.root_dir)
file_time, file_size = get_mtime_and_size(self.DATA, dvcignore)
file_object_time, file_object_size = get_mtime_and_size(
PathInfo(self.DATA), dvcignore
)
self.assertEqual(file_time, file_object_time)
self.assertEqual(file_size, file_object_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this test is valuable:

  • this is tested via functional tests anyway
  • this doesn't test for non-dir argument

I would just drop it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor @efiop How should I make the required changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are verifying that it works the same with str and path_infos and for that the test is suitable and valuable. Let's keep it.

Copy link
Contributor

@efiop efiop Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@algomaster99 But this doesn't test for non-dir argument is a valid one(though it is probably meant to say "dir" instead of "non-dir"). Let's also run this test with self.DATA_DIR, so we verify that this works for dirs as well. Make sure that you are not copypasting the code for the new test, but rather use parametrization. E.g. you could make this as a pytest-style test with something like:

@pytest.mark.parametrize("path", ["data_dir", "data"])

You can look around for parametrize examples in our existing tests. 🙂

@algomaster99 algomaster99 requested review from Suor and efiop November 1, 2019 20:10
@efiop
Copy link
Contributor

efiop commented Nov 1, 2019

The agreed way to do this is wrapping path args with fspath() unconditionally, see how this is done in dvc.system.System methods.

@Suor This is not the agreed way. Current patch handles it much better by using fspath_py35 where needed, so next year we can go and remove those easily when py35 reaches end-of-life.

dvc/utils/fs.py Outdated Show resolved Hide resolved
dvc/state.py Show resolved Hide resolved
@Suor
Copy link
Contributor

Suor commented Nov 2, 2019

@efiop having one place at the start of the function to coerce paths is more durable solution. If you only pass that to file utils and expect this to work transparently in Python 3.6+ you might use fspath_py35() instead of fspath() there. Even in this case fspath() might be a better choice both for performance and durability reasons:

  • will never stringify paths several times
  • the whole func body is easier to reason about since it only has to deal with strings
  • no future bug possible when you'll pass that path to something other than fs util

We discussed that previously and I had an impression we agreed on that. Maybe that was only about System.* stuff? Anyway all the arguments above apply. One can think about this as a form of defensive programming, which is also good/ok for performance.

P.S. This is general discussion, which should not delay this particular PR. So we may resolve it in any of those ways.

@algomaster99
Copy link
Contributor Author

@efiop
I had to move this test out of the class otherwise it wasn't able to recognize path argument. It kept throwing NameError: missing 1 required positional argument: 'path' This could be the reason - https://stackoverflow.com/a/35562401/11751642.

@algomaster99
Copy link
Contributor Author

@efiop DeepSource test is failing. Do you know how can I fix it? The assert statements were already there. 😕

@efiop
Copy link
Contributor

efiop commented Nov 2, 2019

@Suor Good points! In this particular case, I'm not that worried about performance, as double fspath_py35 will only occur for standalone files(as giant directories are handled under first if block), which is not very significant. In terms of future bugs, this function is now fully covered with the unit test, so we are also covered there. We could still do something like path = fspath_py35(path), since it will be easy to get rid of, sure, but the current approach also doesn't look like a deal-breaker. Not quite sold on fspath(path) as it does an unnecessary conversion, even for performance sake(which is insignificant because of the reasons described above), esp since it would probably have to be solved by caching PathInfo.fspath in a general case, so we don't have to micro-manage.

@efiop
Copy link
Contributor

efiop commented Nov 2, 2019

@algomaster99 Take DeepSource's report as a recommendation, not as a requirement. We are still fine-tuning the settings for it.

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@efiop efiop merged commit e21fc34 into iterative:master Nov 2, 2019
@Suor
Copy link
Contributor

Suor commented Nov 3, 2019

@efiop I don't worry too much about performance here too. The main point was simplifying code comprehension. Having tests covering things is good, but writing durable code is a basic measure that also works and is cheap. After we all agree on particularities of cause :)

And, BTW, it looks like they use "convert at the function start" approach in stdlib too, see os.walk() for example.

@algomaster99 algomaster99 deleted the test-argument-types-get_mtime_and_size branch November 4, 2019 08:37
@efiop efiop added refactoring Factoring and re-factoring testing Related to the tests and the testing infrastructure labels Nov 4, 2019
@Suor Suor mentioned this pull request Jan 8, 2020
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Factoring and re-factoring testing Related to the tests and the testing infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants