add support for optional progress updates channel #661

deitch · 2023-12-21T12:19:29Z

If CopyOptions.UpdateChannel is non-nil, then with each block copied during a Copy*(), it sends an update down the channel. The update includes the amount copied in this update, and the descriptor of what is being copied. This descriptor lets the consumer determine what blob it applies to, and what the expected total size is (which can be used, e.g. to calculate percentages or create a progress bar).

It does not include relative sizes of the whole root desc or tag in the Copy(), because we have no way of knowing that; you would have to walk the entire tree first. Granted, you can get it from manifests first, and only then the more expensive other elements, but that would require a complete restructure. This is more than good enough for know.

Discussed in Slack with @Wwwsylvia

deitch · 2023-12-21T12:19:37Z

Sylvia did raise the following questions. Adding here for follow-on discussion:

It is not clear to me that how does the caller read the channel. Is the channel generated internally or is it passed in by the caller? Is it per-blob?
How do we report progress for the skipped (existing) blobs? What if the caller (e.g. ORAS CLI) wants to show 100% for the skipped blobs (as shown in the screenshot)?
What if the caller wants to track the progress of PushBytes? Is there a way to reuse this mechanism?

Basically, the idea is that we want to make sure the APIs of oras-go are generic and extensible.

codecov-commenter · 2023-12-21T12:20:47Z

Codecov Report

Attention: Patch coverage is 6.25000% with 15 lines in your changes missing coverage. Please review.

Project coverage is 75.27%. Comparing base (0b78aa6) to head (cd02b49).
Report is 34 commits behind head on main.

Files with missing lines	Patch %	Lines
progress_reader.go	0.00%	8 Missing ⚠️
copy.go	12.50%	5 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #661      +/-   ##
==========================================
- Coverage   75.46%   75.27%   -0.19%     
==========================================
  Files          59       60       +1     
  Lines        5640     5654      +14     
==========================================
  Hits         4256     4256              
- Misses       1019     1032      +13     
- Partials      365      366       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

deitch · 2023-12-21T12:21:22Z

Some responses:

It is not clear to me that how does the caller read the channel. Is the channel generated internally or is it passed in by the caller?

Passed in by the caller, CopyOptions.UpdateChannel. If it is nil, we do nothing different; if it is non-nil, we send updates to it with each copy.

Is it per-blob?

No, one channel, but each update includes the descriptor. See the extended comment on UpdateChannel.

How do we report progress for the skipped (existing) blobs?

I hadn’t thought about that. I think I would send a single update of 100% on existing blobs. But that can be subject for discussion. I think I would do this first, then do that in a follow-on. One improvement at a time.

What if the caller wants to track the progress of PushBytes? Is there a way to reuse this mechanism?

I don’t see why not. All this is, is a mechanism for sending updates in a channel. The issue here is that there are no “opts” on which to add it

deitch · 2023-12-21T12:21:46Z

Rebased

shizhMSFT

A complete and sound progress updating framework is non-trivial for both oras-go v1 and v2. Thanks for the initial attempt and let's see if we can iterate it to a better design.

shizhMSFT · 2023-12-21T14:31:43Z

copy.go

+	// Updates are sent each time a block is copied. The number of bytes copied
+	// depends upon io.Copy, which, by default, is 32KB. As of now, this cannot
+	// be changed. We may provided that capability in a future update.
+	UpdateChannel chan<- CopyUpdate


Channels are blocking. If someone sets opts.UpdateChannel = make(chan<- CopyUpdate) and never consumes the updates, the Copy() will be blocked forever.

Yeah, that is a good point. I was thinking about that, wasn't sure quite how to handle it. Here are some possibilities:

send to the channel only from within a goroutine. That could lead to a lot of goroutines if it blocks. See below.

instead of passing a channel, pass a function call. Each such call would be in a goroutine, but again, same issue. See below.

For either of the above, maybe we have one goroutine and one channel we own. Main routine publishes to our channel (we control it, so we can ensure no blocking issues), which then either publishes to the passed channel (option 1 above) or calls the passed function (option 2). At least we don't block our main routine, although if their function blocks, our goroutine still is blocked.

Is a func call better than a goroutine? In theory, they could use a channel in there. Or have a slow func call.

shizhMSFT · 2023-12-21T14:38:24Z

copy.go

+	// UpdateChannel is an optional channel to receive progress updates.
+	// Each update will include the number of bytes copied for a particular blob
+	// or manifest, the expected total size, and the descriptor of the blob or
+	// manifest. It is up to the consumer of the channel to differentiate
+	// between updates among different blobs and manifests; no mechanism is
+	// provided for distinguishing between them, other than the descriptor
+	// passed with each update. The total size of downloads of all blobs and
+	// manifests is not provided, as it is not known. You can calculate the
+	// percentage downloaded for a particular blob in an individual update
+	// based on the total size of that blob, which is provided in the
+	// descriptor, and the number of bytes copied, which is provided in the
+	// update.


Presenting a progress bar of an operation isn't an easy job. Eventually, we will end up with an MVC pattern if we continue iterating.

Therefore, we need to define data models first for oras operations like Copy, PushBytes, etc.

In fact, oras CLI has a data model so as the buildkit.

/cc @qweeah

Presenting a progress bar of an operation isn't an easy job

True, but I am not concerned with that here. oras CLI might, but the Copy() and such library calls should not care. They only should worry about some library-centric way of publishing updates. A CLI (like oras) or other consumer can do with them what they want.

we need to define data models

The data model is simple: stream of defined struct (or interface, if you prefer), each of which contains the amount of bytes transferred and descriptor to which it applies. Anything higher level would be outside the scope of this type of option, I think.

shizhMSFT · 2023-12-21T14:45:25Z

copy.go

+	// Updates are sent each time a block is copied. The number of bytes copied
+	// depends upon io.Copy, which, by default, is 32KB. As of now, this cannot
+	// be changed. We may provided that capability in a future update.


Not really. The copy operation does not necessarily depends on io.Copy. It is purely based on the implementation of the destination Target. For instance, copying from any target to memory / oci targets does not call io.Copy.

Fair enough. We pass the target an io.ReadCloser, so something is calling Read() multiple times. We don't really care what it is, as long as it calls Read() (which it does). I can update the comment here.

I updated this comment; take a look.

shizhMSFT · 2023-12-21T14:48:52Z

copy.go

+	if ch != nil {
+		rc = &progressReader{
+			c: ch,
+			r: rc,
+		}
+	}


Wrapping a ReadCloser may have perf penalties as some types may implement io.WriteTo which can trigger an optimization by the built-in io.Copy() method.

Are you saying that it is performance impact by wrapping the Read() call? Or that by not having the wrapper also implement WriterTo, if the original ReadCloser does, then we lose the ability to WriteTo? I think the second of these.

What do you suggest? That we check for WriterTo, like io.Copy() does, and if it does, implement WriteTo().

How would we then capture the updates?

shizhMSFT · 2023-12-21T14:51:59Z

progress_reader.go

+}
+
+func (p *progressReader) Close() error {
+	close(p.c)


Closing the channel is dangerous as the channel is shared across all nodes. It panics if other nodes get updates.

In other words, the caller has the responsibility to close the channel.

Yeah, very good point. I will change that and add the comment.

deitch · 2023-12-27T11:55:05Z

Going to rebase to get it back in line with main; any thoughts on the feedback @shizhMSFT ?

deitch · 2023-12-27T11:55:27Z

Rebased

shizhMSFT · 2023-12-28T09:33:04Z

Going to rebase to get it back in line with main; any thoughts on the feedback @shizhMSFT ?

@deitch I think this PR is good brainstorming idea and worths an issue or discussion so that we can design it holistically with production-level quality.

deitch · 2023-12-28T09:34:51Z

PR is good brainstorming idea and worths an issue or discussion so that we can design it holistically with production-level quality

I have no idea what that means. If the overall direction is good, let's fix this and get it in. If not, let's redo it.

shizhMSFT · 2023-12-28T10:08:36Z

It means that we need an issue or discussion for a systematical design as this PR is a feature request proposal. The reason behind it is that oras-go/v2 is now at a stable state and we should introduce new a feature (i.e. merge to main) with the best design we could achieve to avoid future breaking changes as much as we can. (note: breaking changes requires us to introduce oras-go/v3).

Since issue and discussion are good place to propose a feature request, would you like to open one with details like the motivation of the feature request (i.e. what would you like to be added? why is this needed for oras?)? We should have an issue template for oras-go but it is still in progress (tracked by #641; the oras repo has one). With an open issue / discussion, more maintainers like @sajayantony, @TerryHowe, and @sabre1041 can join the discussion and give inputs so that this conversation is not just between you and me or @Wwwsylvia. Of course, you may reference this PR in the new issue / discussion as an implementation candidate (so do others). #650 and #651 by @ktarplee are a good example.

Meanwhile, I would like to continue the discussions on the aspects of API design, concurrency (go-routine safety), performance, and security so that we can iterate your idea and make oras-go a better library.

deitch · 2023-12-28T10:15:30Z

Sure. This all started with an extended discussion we had on Slack with @Wwwsylvia . Part of the issue is that this capability existed (in a different way, but still) in v1. The move to v2, with all of its many benefits, lost this capability. This is trying to get it back, so that downstream dependencies that want to switch from v1 to v2 can do so without loss of ability.

BTW, if this PR changes an API, and there is a way to do it without changing it (i.e. backwards-compatible), then by all means. If I recall correctly, that was the main driver behind the variadic options in the main calls in v1. We knew we could add features going forward without changing the API.

sajayantony · 2024-01-03T20:48:40Z

I think this is general goodness and so I'm in support of landing something that is feasible and generally stable.
Can someone given an code example of how this API might be consumed with maybe a warning of how it can be abused as well?

@deitch maybe a short hackmd doc or something that we can use for API review from the consumer aspect?

deitch · 2024-01-04T09:20:51Z

Can someone given an code example of how this API might be consumed
maybe a short hackmd doc or something

I will put something in a comment here. It looks like there is question about whether or not this API is the right one, or even about putting in the changes. Once we have general agreement on adding it and the approach, I am happy to put a formal example and doc anywhere we want.

Signed-off-by: Avi Deitcher <avi@deitcher.net>

deitch · 2024-01-04T09:21:29Z

Also rebased while I was at it

deitch · 2024-01-04T09:40:03Z

Simple example:

ch := make(chan oras.CopyUpdate, 1000)
opts := oras.CopyOptions{
    UpdateChannel: ch        
}
desc, err := oras.Copy(ctx, src, tagName, dst, tagName, opts)
go func(ch <- chan CopyUpdate) {
    for msg := range ch {
        fmt.Printf("copied %d bytes out of %d total for %s\n", msg.Copied, msg.Descriptor.Size, msg.Descriptor.Digest)
    }
}(ch)

I realize that we probably would need to extend it the channel type so there is a way to send "complete" or "error". But I am waiting until the approach is agreed upon.

As for abusing, it is pretty straightforward. Have a channel, do not buffer it, or do not read from it until the buffer is full.

We could make the channel send (in this PR) non-blocking. That means that it would risk lost messages, but I find that an acceptable risk, much more so than, "user abuses it and the whole thing blocks."

github-actions · 2024-10-15T10:43:18Z

This PR is stale because it has been open for 45 days with no activity. Remove the stale label or comment to prevent it from being closed in 30 days.

deitch · 2024-10-15T10:52:57Z

Remove stale

deitch · 2024-10-15T10:53:20Z

@sajayantony how can we bring this back to life?

shizhMSFT · 2024-10-21T13:42:05Z

According to the discussions on the slack, more topics could be

What does the view model look like?
What's the model of an update? How to determine the frequency of the update?
How to emit the update?
How do we ensure API flexibility? Not everyone likes channels.
How do we ensure performance is not degraded?
...

@deitch will open a new feature request issue to track properly.

deitch · 2024-10-21T14:07:24Z

Opened #839 ; I am sure I missed something, but that should be good as a start

deitch requested review from sajayantony, shizhMSFT, SteveLasker and Wwwsylvia as code owners December 21, 2023 12:19

deitch force-pushed the progress-updates branch from 178a2e8 to 595783e Compare December 21, 2023 12:21

shizhMSFT reviewed Dec 21, 2023

View reviewed changes

deitch force-pushed the progress-updates branch 2 times, most recently from ebc7070 to 7fa4d37 Compare December 21, 2023 15:34

shizhMSFT added the experimental Issues or pull requests depending on WIP specs label Dec 26, 2023

deitch force-pushed the progress-updates branch from 7fa4d37 to 8f9dca7 Compare December 27, 2023 11:55

add support for optional progress updates channel

cd02b49

Signed-off-by: Avi Deitcher <avi@deitcher.net>

deitch force-pushed the progress-updates branch from 8f9dca7 to cd02b49 Compare January 4, 2024 09:21

github-actions bot added the stale Inactive issues or pull requests label Oct 15, 2024

github-actions bot removed the stale Inactive issues or pull requests label Oct 16, 2024

deitch mentioned this pull request Oct 21, 2024

Copy progress - feature request #839

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for optional progress updates channel #661

add support for optional progress updates channel #661

deitch commented Dec 21, 2023

deitch commented Dec 21, 2023

codecov-commenter commented Dec 21, 2023 •

edited by codecov bot

Loading

deitch commented Dec 21, 2023

deitch commented Dec 21, 2023

shizhMSFT left a comment

shizhMSFT Dec 21, 2023

deitch Dec 21, 2023

shizhMSFT Dec 21, 2023

shizhMSFT Dec 21, 2023

shizhMSFT Dec 21, 2023

deitch Dec 21, 2023

shizhMSFT Dec 21, 2023

deitch Dec 21, 2023

deitch Dec 21, 2023

shizhMSFT Dec 21, 2023

deitch Dec 21, 2023

shizhMSFT Dec 21, 2023

deitch Dec 21, 2023

deitch Dec 21, 2023

deitch commented Dec 27, 2023

deitch commented Dec 27, 2023

shizhMSFT commented Dec 28, 2023

deitch commented Dec 28, 2023

shizhMSFT commented Dec 28, 2023

deitch commented Dec 28, 2023

sajayantony commented Jan 3, 2024 •

edited

Loading

deitch commented Jan 4, 2024

deitch commented Jan 4, 2024

deitch commented Jan 4, 2024

github-actions bot commented Oct 15, 2024

deitch commented Oct 15, 2024

deitch commented Oct 15, 2024

shizhMSFT commented Oct 21, 2024

deitch commented Oct 21, 2024

add support for optional progress updates channel #661

Are you sure you want to change the base?

add support for optional progress updates channel #661

Conversation

deitch commented Dec 21, 2023

deitch commented Dec 21, 2023

codecov-commenter commented Dec 21, 2023 • edited by codecov bot Loading

Codecov Report

deitch commented Dec 21, 2023

deitch commented Dec 21, 2023

shizhMSFT left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deitch commented Dec 27, 2023

deitch commented Dec 27, 2023

shizhMSFT commented Dec 28, 2023

deitch commented Dec 28, 2023

shizhMSFT commented Dec 28, 2023

deitch commented Dec 28, 2023

sajayantony commented Jan 3, 2024 • edited Loading

deitch commented Jan 4, 2024

deitch commented Jan 4, 2024

deitch commented Jan 4, 2024

github-actions bot commented Oct 15, 2024

deitch commented Oct 15, 2024

deitch commented Oct 15, 2024

shizhMSFT commented Oct 21, 2024

deitch commented Oct 21, 2024

codecov-commenter commented Dec 21, 2023 •

edited by codecov bot

Loading

sajayantony commented Jan 3, 2024 •

edited

Loading