-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ui: push: use blank pbar for the header #10197
Conversation
Same as in `dvc fetch`.
with ui.progress( | ||
desc="Pushing", | ||
unit="file", | ||
bar_format="{desc}", | ||
leave=True, | ||
) as pb: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different to the following?
with ui.progress( | |
desc="Pushing", | |
unit="file", | |
bar_format="{desc}", | |
leave=True, | |
) as pb: | |
ui.write("Pushing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry Not different practically (as far as I know, you tell me). Though wondering if it should somehow be later used to create nested progress below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not seem like we are even using this callback, so seems misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I took a look into index.push
. I have a few questions.
- Why does
push()
need to support multiple indexes? It does not seem it needs multiple indexes, and the code (and the callback) would be simplified if it only takes a single index. - The Object Storage - to - ObjectStorage transfer and the other case are bundled into the same place. But they use very different implementation. The other one does not even feel like a
push
to me, but is acheckout
. So fundamentally, the operation that we are doing is different. This makes it hard to implement a good callback because they are different operations. The code would be simpler too if this happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Those are condensed indexes, per every cloud that we need to upload to. We do that in a for loop, but i don't want to rule out uploading to multiple at the same time. You are right that it would simplify it, but so will just rmoving those tqdm bars in ipush and having one bytes/object based pbar (we have sum(len(idx) for idx in data)) in dvc/repo/push.py if we want to.
-
So at the end of the day in both cases what we are doing is diffing what we want and what we have and appliying changes (similar to to rsync). I know that it looks like checkout, because it really is the same operation, just we are used to checkouts being local. And object transfer will likely go away (mostly kept for now because of legacy indexes, but we can convert them to new ones) and also bottle down to diff + apply. We are much more used to object transfer, but we are forgetting that it does (a pretty convoluted) status inside already, which is pretty much same as diff but with extra steps. So a bit more work to do there.
Overall it seems like diff + apply workflow is the most universal so far, so maybe that's what we'll do in in push/fetch (maybe even instead of ipush/ifetch). And that also probably means that it is better to start with figuring out dvc checkout
ui first, since it is the most settled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have sum(len(idx) for idx in data)) in dvc/repo/push
We get the actual number of push
from the diff. So that would not work either.
I know we try to do the same in https://github.com/iterative/dvc-data/blob/a535b4bec313cfbf9046b4e96027c970aaae91c5/src/dvc_data/index/push.py#L76, but a) total
is not a valid keyword, it should be size
and b) transfer
calls set_size
overwriting the previous value you may have set, which feels more correct to me for actual number of transfers involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We get the actual number of push from the diff. So that would not work either.
Not sure I follow, could you rephrase/elaborate, please?
I know we try to do the same in
All good points, feel free to send a PR.
About what to base a pbar on - this is a very good question. Maybe on total entries in the index, maybe on total changes that we need to apply. I'm keen on the latter, was doing that locally, but haven't posted publically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow, could you rephrase/elaborate, please?
I mean that the total
should be based on amount of actual file transfers (aka diff
), than the total number of files in the index.
All good points, feel free to send a PR.
I did. See https://github.com/iterative/dvc-data/pull/486/files?w=1. I also moved one operation out of apply to simplify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that the total should be based on amount of actual file transfers (aka diff), than the total number of files in the index.
I'm not so sure about that at this moment. Deletions/chmods/directory creations are also meaningful and take time, so might make sense to show them as well (e.g. maybe as a pbar of total changes or actions that we need to do).
Same as in
dvc fetch
.