Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ui: push: use blank pbar for the header #10197

Merged
merged 1 commit into from
Dec 23, 2023
Merged

ui: push: use blank pbar for the header #10197

merged 1 commit into from
Dec 23, 2023

Conversation

efiop
Copy link
Contributor

@efiop efiop commented Dec 23, 2023

Same as in dvc fetch.

@efiop efiop merged commit 402c107 into main Dec 23, 2023
20 checks passed
@efiop efiop deleted the efiop-patch-2 branch December 23, 2023 02:40
Comment on lines 139 to 143
with ui.progress(
desc="Pushing",
unit="file",
bar_format="{desc}",
leave=True,
) as pb:
Copy link
Member

@skshetry skshetry Dec 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different to the following?

Suggested change
with ui.progress(
desc="Pushing",
unit="file",
bar_format="{desc}",
leave=True,
) as pb:
ui.write("Pushing")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry Not different practically (as far as I know, you tell me). Though wondering if it should somehow be later used to create nested progress below.

Copy link
Member

@skshetry skshetry Dec 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I took a look into index.push. I have a few questions.

  1. Why does push() need to support multiple indexes? It does not seem it needs multiple indexes, and the code (and the callback) would be simplified if it only takes a single index.
  2. The Object Storage - to - ObjectStorage transfer and the other case are bundled into the same place. But they use very different implementation. The other one does not even feel like a push to me, but is a checkout. So fundamentally, the operation that we are doing is different. This makes it hard to implement a good callback because they are different operations. The code would be simpler too if this happened.

Copy link
Contributor Author

@efiop efiop Dec 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Those are condensed indexes, per every cloud that we need to upload to. We do that in a for loop, but i don't want to rule out uploading to multiple at the same time. You are right that it would simplify it, but so will just rmoving those tqdm bars in ipush and having one bytes/object based pbar (we have sum(len(idx) for idx in data)) in dvc/repo/push.py if we want to.

  2. So at the end of the day in both cases what we are doing is diffing what we want and what we have and appliying changes (similar to to rsync). I know that it looks like checkout, because it really is the same operation, just we are used to checkouts being local. And object transfer will likely go away (mostly kept for now because of legacy indexes, but we can convert them to new ones) and also bottle down to diff + apply. We are much more used to object transfer, but we are forgetting that it does (a pretty convoluted) status inside already, which is pretty much same as diff but with extra steps. So a bit more work to do there.

Overall it seems like diff + apply workflow is the most universal so far, so maybe that's what we'll do in in push/fetch (maybe even instead of ipush/ifetch). And that also probably means that it is better to start with figuring out dvc checkout ui first, since it is the most settled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have sum(len(idx) for idx in data)) in dvc/repo/push

We get the actual number of push from the diff. So that would not work either.

I know we try to do the same in https://github.com/iterative/dvc-data/blob/a535b4bec313cfbf9046b4e96027c970aaae91c5/src/dvc_data/index/push.py#L76, but a) total is not a valid keyword, it should be size and b) transfer calls set_size overwriting the previous value you may have set, which feels more correct to me for actual number of transfers involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get the actual number of push from the diff. So that would not work either.

Not sure I follow, could you rephrase/elaborate, please?

I know we try to do the same in

All good points, feel free to send a PR.

About what to base a pbar on - this is a very good question. Maybe on total entries in the index, maybe on total changes that we need to apply. I'm keen on the latter, was doing that locally, but haven't posted publically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, could you rephrase/elaborate, please?

I mean that the total should be based on amount of actual file transfers (aka diff), than the total number of files in the index.

All good points, feel free to send a PR.

I did. See https://github.com/iterative/dvc-data/pull/486/files?w=1. I also moved one operation out of apply to simplify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that the total should be based on amount of actual file transfers (aka diff), than the total number of files in the index.

I'm not so sure about that at this moment. Deletions/chmods/directory creations are also meaningful and take time, so might make sense to show them as well (e.g. maybe as a pbar of total changes or actions that we need to do).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants