Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkout: show summary by default and introduce --show-changes flag #3401

Merged
merged 6 commits into from
Mar 16, 2020

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Feb 25, 2020

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • πŸ“– Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
    cmd-ref: document checkout displaying changesΒ dvc.org#1054

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Resolves #2979.

TODO

  • Add few tests.
  • CheckoutError: go with the consensus.
  • Make changes to flags if required. (make default?)
  • Handle --relink and modified case.
  • Show --relink successfully message. (Separate PR)
  • Show how many files were downloaded/fetched. (Separate PR)
  • Remove warnings on checkout failure
  • Fix pull/fetch outputs
  • Autocompletion scripts and help messages

@skshetry skshetry self-assigned this Feb 25, 2020
@skshetry skshetry force-pushed the checkout-summary-changes branch from 63de201 to 23256a5 Compare February 25, 2020 11:48
@skshetry skshetry marked this pull request as ready for review February 25, 2020 12:58
@skshetry skshetry changed the title [WIP] checkout: introduce summary and changes flag checkout: introduce summary and changes flag Feb 25, 2020
Copy link
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes our checkout tech debt even bigger. But I am fine for now as long as this works.

dvc/command/checkout.py Outdated Show resolved Hide resolved
dvc/command/checkout.py Outdated Show resolved Hide resolved
dvc/command/checkout.py Outdated Show resolved Hide resolved
dvc/command/checkout.py Show resolved Hide resolved
dvc/command/checkout.py Outdated Show resolved Hide resolved
dvc/repo/checkout.py Outdated Show resolved Hide resolved
dvc/stage.py Outdated Show resolved Hide resolved
dvc/stage.py Outdated Show resolved Hide resolved
dvc/state.py Outdated Show resolved Hide resolved
tests/func/test_state.py Show resolved Hide resolved
dvc/command/checkout.py Outdated Show resolved Hide resolved
@skshetry
Copy link
Member Author

@efiop @Suor, the plan was to just show stats/summary. More verbose checkout changes did not seem too hard with existing stats/summary implementation so I quickly prototyped on it with --show-changes. I've asked @dmpetrov for feedback.

Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment.
I find handling of added, modified, and no_change as False, True, None respectively a bit confusing at least in 2 places in code, one is base._checkout_dir and another one is pointed by @Suor
here
Boolean has 2 values and I think it is easy to omit the information passed from within checkout.

dvc/remote/base.py Outdated Show resolved Hide resolved
logger = logging.getLogger(__name__)


def _humanize_wordseries(words=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it to the utils? there should be some libraries that do different "humanization" stuff ... not sure if we want to utilize them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep it near to where it's used for now. If we later need it elsewhere, we can do so at that time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ‘ up to you, absolutely minor thing.

Let me share my intuition behind this, though :)

  • Readability - any boilerplate code that makes file longer and absolutely not relevant/generic makes it harder to see the "bigger picture" (tools like CodeClimate complain about file being too long, about too many methods, etc for a good reason);
  • Reusability - the knowledge about this generic "text" util is only in your head. Low chances that someone will be able to find it when it's needed again. Means some probability of a duplicate later.
  • Testing - you can unit test this simple things, would be strange to write a unit test for a private method though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not like humanizing, it's joining and not word series but some items in a list, so the name might be better, e.g. human_join() or smth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On keeping it here - I am with @skshetry here, move it later if that is used elsewhere. Spreading code is the big source of errors. Also impairs readability.

dvc/command/checkout.py Outdated Show resolved Hide resolved
dvc/stage.py Outdated Show resolved Hide resolved
@skshetry skshetry changed the title checkout: introduce summary and changes flag [WIp] checkout: introduce summary and changes flag Feb 27, 2020
@skshetry skshetry changed the title [WIp] checkout: introduce summary and changes flag [WIP] checkout: introduce summary and changes flag Feb 27, 2020
dvc/stage.py Outdated Show resolved Hide resolved
dvc/stage.py Outdated
def _fspath_dir(path):
"""Return string with "/" at the end if it's a dir."""
p = os.path.join(path, "") if os.path.isdir(path) else fspath(path)
return relpath(p, self.repo.root_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about it? (if I got the function purpose right)

rel = relpath(p, self.repo.root_dir)
return "{}{}".format(rel, os.path.sep) if os.path.isdir(path) else rel

@skshetry skshetry force-pushed the checkout-summary-changes branch from 30c9e5d to 1f4fbf0 Compare February 28, 2020 12:37
@skshetry skshetry changed the title [WIP] checkout: introduce summary and changes flag checkout: introduce summary and changes flag Feb 28, 2020
def __init__(self, target_infos):
def __init__(self, target_infos, stats=None):
self.target_infos = target_infos
self.stats = stats
targets = [str(t) for t in target_infos]
m = (
"Checkout failed for following targets:\n {}\nDid you "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, really minor request (feel free to nor change this), came out as I was reviewing:

ERROR: unexpected error - Checkout failed for following targets:
 data/prepared
data/features
data/data.xml

there is an extra space that is not needed before data/prepared. I think the change is \n {}\n -> \n{}\n

@shcheklein
Copy link
Member

@skshetry going back to the requirements in the #2979 (you mention in the description) , I think it does not resolve them for the failed case:

WARNING: Cache '6836f797f3924fb46fcfd6b9f6aa6416.dir' not found. File 'data/prepared' won't be created.
WARNING: Cache 'a304afb96060aad90176268345e10355' not found. File 'data/data.xml' won't be created.
WARNING: Cache '42c7025fc0edeb174069280d17add2d4.dir' not found. File 'data/features' won't be created.
1 added
ERROR: unexpected error - Checkout failed for following targets:
 data/prepared
data/data.xml
data/features
Did you forget to fetch?

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

we still have excessive WARNINGS . We can create a separate PR for this or keep the ticket open. I think a summary is enough:

1 added, 3 failed

ERROR: unexpected error - Checkout failed for following targets:
 data/prepared
data/data.xml
data/features
Did you forget to fetch?

Again, happy to just keep that ticket open.

@shcheklein
Copy link
Member

shcheklein commented Mar 10, 2020

It still affects dvc pull in some unexpected (?) ways:

(.env) [ivan@ivan ~/Projects/example-get-started]$ dvc pull
2 added
Everything is up to date.

and sometimes:

(.env) [ivan@ivan ~/Projects/example-get-started]$ dvc pull
2 modified

@shcheklein
Copy link
Member

(Also, going through the example get started I have more and more reasons for the --show-changes option by default. One more reason - we mix directories and files when we count added, modified, etc. E.g. when it prints 1 added but actually it's a directory with two preprocessed files it make it confusing if dvc checkout model.dvc prints the same. Just a thought to consider.)

@skshetry
Copy link
Member Author

it's a directory with two preprocessed files it make it confusing if dvc checkout model.dvc prints the same. Just a thought to consider.)

I'm also more inclined to make this the default, even more so with making output formats of git and dvc checkouts similar (helpful with post-checkout). So, to not make anymore changes, I'll swap the defaults, and introduce --summary flag.

Later, we could decide if we really need --summary, and then either remove or even rename those as --shortstat/ --short/--stat/--numstat, etc.

It still affects dvc pull in some unexpected (?) ways:

Looks like on second example, pull did fetch something and did checkout (checkout looks to be working fine), but pull did not report anything (same in-line with previous behavior that it didn't print anything if it fetched something). I'll fix fetch and pull behavior on separate PRs.

we still have excessive WARNINGS . We can create a separate PR for this or keep the ticket open. I think a summary is enough:

It'd be better to just reduce the level of loggings here to DEBUG. Again, I'll handle it in a separate PRs (added to TODO).

Thanks Ivan for checking out!

@Suor
Copy link
Contributor

Suor commented Mar 11, 2020

@skshetry I don't think you can simply make those warnings debug, this will affect more things than just checkout, e.g. Stage.run() does not check for those checkouts to succeed.

dvc/command/checkout.py Outdated Show resolved Hide resolved
dvc/command/checkout.py Outdated Show resolved Hide resolved
dvc/command/checkout.py Outdated Show resolved Hide resolved
dvc/external_repo.py Outdated Show resolved Hide resolved
dvc/remote/base.py Outdated Show resolved Hide resolved
dvc/repo/checkout.py Outdated Show resolved Hide resolved
unused = _get_unused_links(self)

stats["deleted"] = [_fspath_dir(u, self.root_dir) for u in unused]
self.state.remove_links(unused)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not under if? Looks fragile this way, we shouldn't call remove_links() at all if a particular target is specified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's empty if there's no argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I need to check if something is a dir before deleting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkout in dvc:

  • checks out particular thing(s) if that things (target) is specified
  • if no target is specified it checks out everything and removes anything previously checked out and having no references now.

I.e. if there is target we don't try to change the rest of the workspace, so .remove_links() should not be called. The logic of your code is different.

Copy link
Member Author

@skshetry skshetry Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If targets are specified, unused will be empty and hence, nothing will get removed. If your concern is regarding calling .remove_links() at all, I don't think there's any reason to be defensive.

@skshetry
Copy link
Member Author

@shcheklein, I have changed the defaults. There's a --summary flag for showing summary instead of detaliled changelog. Other stuffs will be fixed in a separate PR.

@skshetry skshetry requested review from shcheklein, pared and Suor March 14, 2020 03:24
Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, thanks @skshetry !

(please, create a ticket of address docs and bash/zsh scripts? I assume the plan is to do a separate PR, right?)

@shcheklein
Copy link
Member

sorry, wanted to approve initially, missed the button )

Co-Authored-By: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
@skshetry
Copy link
Member Author

@shcheklein, @efiop, I have created an issue #3483 for handling autocompletion scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
completion bash and zsh completion scripts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

checkout/add: change warning messages to regular/info if behavior is expected
8 participants