Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CommandBuffer usage clarification: internal, external, both? #264

Closed
bbernhar opened this issue May 2, 2022 · 19 comments
Closed

CommandBuffer usage clarification: internal, external, both? #264

bbernhar opened this issue May 2, 2022 · 19 comments
Labels

Comments

@bbernhar
Copy link

bbernhar commented May 2, 2022

If the intent of WebNN is to produce an immutable MLCommandBuffer that can be read-only by WebGPU, then I would suggest we consider 1) renaming it to MLExternalCommandBuffer and 2) avoid overloading WebGPU Interop with requirements WebGPU does not follow: a command buffer with GPU commands being equal to a command buffer with non-GPU commands - by moving MLExternalCommandBuffer into a WebNN Interop section.

Alternatively, we could keep MLCommandBuffer (internal usage) but allow WebNN a means to submit (ex computeAsync). This would avoid breaking WebGPU and allow WebNN to have consistent GPU async support experience on both native (standalone) and web.

Thoughts?

@huningxin @wchao1115 @RafaelCintron

@anssiko
Copy link
Member

anssiko commented May 12, 2022

Thanks for this feedback, @bbernhar. We discussed this issue on our call:

  1. There was preference to keep the current name.
  2. I believe this overloading topic is appropriate to bring to the WebGPU group as part of our wide review reach out in the planned update to Investigation: how WebNN / WebGPU interop could be happening gpuweb/gpuweb#2500

@RafaelCintron and @huningxin for further comments.

@bbernhar
Copy link
Author

bbernhar commented May 17, 2022

Thanks @anssiko for the update.

But I was seeking for clarification on the usage of MLCommandBuffer: will it be internal-only (read/write by WebNN), external-only (read-only by WebNN, read/write by WebGPU) or both (read/write both WebNN and WebGPU).

I believe this is a matter for the WebNN WG to decide, as MLCommandBuffer is not a requirement of WebGPU.

@anssiko anssiko changed the title Should MLCommandBuffer be MLExternalCommandBuffer? CommandBuffer usage clarification: internal, external, both? May 18, 2022
@anssiko
Copy link
Member

anssiko commented May 18, 2022

@bbernhar thank you for this additional context. I amended the 19 May 2022 agenda with this issue.

@wchao1115
Copy link
Collaborator

It isn't MLCommandBuffer but rather GPUCommandBuffer, a WebGPU type and an immutable one.

@anssiko
Copy link
Member

anssiko commented May 19, 2022

@bbernhar we discussed this issue today with help of your additional context.

@wchao1115 provided his perspective, and I'm sure he is available to answer any questions you may have in this issue.

@bbernhar
Copy link
Author

@wchao1115 Command buffers are always immutable once created, that's not my question.

Immutability vs usage

External-only means WebNN is unable to synchronize access and any GPU operation (performed by WebNN) is forbidden once created, either by through CPU or by another GPU operation. Internal-only, is the reverse, accessing the GPUCommandBuffer is guaranteed to be synchronized by whoever created it. The latter usage is what WebGPU does but will WebNN be different?

@wchao1115
Copy link
Collaborator

I would expect the GPUCommandBuffer produced by WebNN to be fully compatible with the one produced by WebGPU. By the current API contract, WebNN does not accept a WebGPU command buffer as an input and only produces it.

@bbernhar
Copy link
Author

bbernhar commented May 19, 2022

@wchao1115

Let's assume they are "compatible" and we have some buffer or texture referenced within the GPUCommandBuffer, containing ML commands. What happens when computeAsync executes using them?

Are you telling me "the contract" requires WebNN to go into "read only" mode (ie. no more GPU for you) until WebGPU finishes execution? I believe a lot more clarification here is needed.

@wchao1115
Copy link
Collaborator

computeAsync doesn't take GPUCommandBuffer as an input. It's not meant to. The command buffer produced by the MLCommandEncoder is only meant to be submitted to the GPUQueue by the caller.

Here's the relevant excerpt in the spec:

The MLCommandEncoder interface created by the MLContext.createCommandEncoder() method supports a graph execution method that provides the maximum flexibility to callers that also utilize WebGPU in their application. It does this by placing the workload required to initialize and compute the results of the operations in the graph onto a GPUCommandBuffer. The callers are responsible for the eventual submission of this workload on the GPUQueue through the WebGPU queue submission mechanism. Once the submitted workload is completely executed, the result is avaialble in the bound output buffers.

@bbernhar
Copy link
Author

bbernhar commented May 19, 2022

computeAsync doesn't take GPUCommandBuffer as an input. It's not meant to. The command buffer produced by the MLCommandEncoder is only meant to be submitted to the GPUQueue by the caller.

I understand. But my point was the same ML resource could be referenced through a WebNN produced GPUCommandBuffer, executed by WebGPU AND used by computeAsync, which could execute with the same resource as input/output, through WebNN.

Since WebNN has no way of knowing what resource state WebGPU execution needs and vise-versa, WebNN computeAsync could mistakenly change the resource state behind WebGPU's back, before it executes.

A contract by interface isn't sufficient here, WebNN needs well-defined resource usage rules too.

@huningxin
Copy link
Contributor

I understand. But my point was the same ML resource could be referenced through a WebNN produced GPUCommandBuffer, executed by WebGPU AND used by computeAsync, which could execute with the same resource as input/output, through WebNN.

In the latest version of the spec, the computeAsync method only takes ArrayBufferView resources while WebNN produced GPUCommandBuffer only references GPU resources. Would that design avoid this issue?

@bbernhar
Copy link
Author

bbernhar commented May 20, 2022

@huningxin

Unfortunately, no. WebGPU could put the GPU resource in a different state, for another GPUCommandBuffer, not created by WebNN, then execute WebNN's GPUCommandBuffer, using that GPU resource, now in the wrong state.

The problem is by sharing a GPUCommandBuffer, WebNN's resource usages need to defined in WebGPU and vise-versa, WebGPU needs to know WebNN's resource rules are. Or WebGPU is made exploitable through WebNN.

The only way I can think of solving this is by defining resource usage rules in WebNN, and introducing GPUExternalBuffer (notice: GPUExternalTexture already exists, same problem but for video).

@wchao1115
Copy link
Collaborator

I think what Ningxin means by:

In the latest version of the spec, the computeAsync method only takes ArrayBufferView resources while WebNN produced GPUCommandBuffer only references GPU resources. Would that design avoid this issue?

In response to the issue you raised:

What happens when computeAsync executes using them?

is that computeAsync as it is now, does not support WebGPU interop and only accepts CPU inputs as ArrayBufferView.

He does not comment on the interop design, simply responds to what you raised as irrelevant to the interop discussion. That, by the way, is also my response earlier.

@bbernhar
Copy link
Author

bbernhar commented May 20, 2022

@wchao1115

WebNN cannot produce a command buffer without specifying the rules of resource usage and also remove itself from providing the responsibility (WebNN) to define AND enforce them somewhere. So if computeAsync does nothing of the sort, then WebGPU will happily stomp over it upon GPUQueue.Submit().

Edit: might help to have an example.

  1. Create a GPU "uniform" buffer with WebGPU, read/write.
  2. Use that GPU buffer, as "input", in a MLCommandEncoder.
  3. WebNN needs the GPU "input" buffer to be read.
  4. WebGPU does a GPU copy into the GPU "input" buffer, it must be write-only.
  5. Call GPUQueue.Submit() using MLCommandEncoder.Finish().
  6. Device lost.

@huningxin
Copy link
Contributor

Thanks @wchao1115 for your comments. It clarifies my response regarding to computeAsync.

@bbernhar

The problem is by sharing a GPUCommandBuffer, WebNN's resource usages need to defined in WebGPU and vise-versa

I think we need to solve this problem.

The only way I can think of solving this is by defining resource usage rules in WebNN, and introducing GPUExternalBuffer (notice: GPUExternalTexture already exists, same problem but for video).

I am not sure whether we need to define GPUExternalBuffer because I don't know what WebNN object it can represent.

As my understanding, the computational workload of a compiled MLGraph that is recorded into a GPUCommandBuffer is more like a specialized GPUComputePipeline. So could we follow the resource usage rules of it? The related WebGPU spec reads as

The outputs correspond to buffer bindings with a type of "storage" and storageTexture bindings with a type of "write-only".

Besides the resource rules, we probably also need to clarify the tensor data layout in GPU resources, especially for GPUTexture. Or we may simply drop the support of GPUTexture and leave that to user compute shader code.

For example, in the WebNN/WebGPU video background blur sample, I implemented a WebGPU compute shader that preprocesses the input GPUTexture for data re-ordering and normalization (tensorization), and puts the result into a GPUBuffer which is used as input of following MLGraph execution for image segmentation.

@bbernhar
Copy link
Author

bbernhar commented May 23, 2022

@huningxin

We could "map" resource usages between WebGPU <=> WebNN (ex. input binding == R/O uniform) and also define new texture layouts (+conversions via a "fat shader") for WebNN.

But the underlying issue is still there, resource states need to be correctly synchronized between WebNN and WebGPU or they will stomp over each other.

Your example works because it does not execute GPU resources outside of WebGPU, it has internal-only usage. The shared video resource being imported into WebGPU, was made mutually exclusive upon ImportExternal. WebGPU "rents" the video texture from video but under the strict condition it is never a normal GPU texture.

@a-sully
Copy link
Contributor

a-sully commented Jan 25, 2024

It's my understanding that this proposal has been superseded by MLBuffer (#482). Can we close this issue?

@anssiko
Copy link
Member

anssiko commented Jan 25, 2024

@a-sully thanks for flagging this as a possibly stale issue.

@bbernhar feel free to close this if you concur and bring any active ideas to the actively discussed issue.

@bbernhar
Copy link
Author

@a-sully SGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants