CommandBuffer usage clarification: internal, external, both? #264

bbernhar · 2022-05-02T21:54:34Z

If the intent of WebNN is to produce an immutable MLCommandBuffer that can be read-only by WebGPU, then I would suggest we consider 1) renaming it to MLExternalCommandBuffer and 2) avoid overloading WebGPU Interop with requirements WebGPU does not follow: a command buffer with GPU commands being equal to a command buffer with non-GPU commands - by moving MLExternalCommandBuffer into a WebNN Interop section.

Alternatively, we could keep MLCommandBuffer (internal usage) but allow WebNN a means to submit (ex computeAsync). This would avoid breaking WebGPU and allow WebNN to have consistent GPU async support experience on both native (standalone) and web.

Thoughts?

@huningxin @wchao1115 @RafaelCintron

The text was updated successfully, but these errors were encountered:

anssiko · 2022-05-12T09:10:54Z

Thanks for this feedback, @bbernhar. We discussed this issue on our call:

There was preference to keep the current name.
I believe this overloading topic is appropriate to bring to the WebGPU group as part of our wide review reach out in the planned update to Investigation: how WebNN / WebGPU interop could be happening gpuweb/gpuweb#2500

@RafaelCintron and @huningxin for further comments.

bbernhar · 2022-05-17T18:50:47Z

Thanks @anssiko for the update.

But I was seeking for clarification on the usage of MLCommandBuffer: will it be internal-only (read/write by WebNN), external-only (read-only by WebNN, read/write by WebGPU) or both (read/write both WebNN and WebGPU).

I believe this is a matter for the WebNN WG to decide, as MLCommandBuffer is not a requirement of WebGPU.

anssiko · 2022-05-18T11:18:07Z

@bbernhar thank you for this additional context. I amended the 19 May 2022 agenda with this issue.

wchao1115 · 2022-05-19T05:59:58Z

It isn't MLCommandBuffer but rather GPUCommandBuffer, a WebGPU type and an immutable one.

anssiko · 2022-05-19T15:25:42Z

@bbernhar we discussed this issue today with help of your additional context.

@wchao1115 provided his perspective, and I'm sure he is available to answer any questions you may have in this issue.

bbernhar · 2022-05-19T17:06:45Z

@wchao1115 Command buffers are always immutable once created, that's not my question.

Immutability vs usage

External-only means WebNN is unable to synchronize access and any GPU operation (performed by WebNN) is forbidden once created, either by through CPU or by another GPU operation. Internal-only, is the reverse, accessing the GPUCommandBuffer is guaranteed to be synchronized by whoever created it. The latter usage is what WebGPU does but will WebNN be different?

wchao1115 · 2022-05-19T18:28:31Z

I would expect the GPUCommandBuffer produced by WebNN to be fully compatible with the one produced by WebGPU. By the current API contract, WebNN does not accept a WebGPU command buffer as an input and only produces it.

bbernhar · 2022-05-19T19:11:15Z

@wchao1115

Let's assume they are "compatible" and we have some buffer or texture referenced within the GPUCommandBuffer, containing ML commands. What happens when computeAsync executes using them?

Are you telling me "the contract" requires WebNN to go into "read only" mode (ie. no more GPU for you) until WebGPU finishes execution? I believe a lot more clarification here is needed.

wchao1115 · 2022-05-19T19:32:35Z

computeAsync doesn't take GPUCommandBuffer as an input. It's not meant to. The command buffer produced by the MLCommandEncoder is only meant to be submitted to the GPUQueue by the caller.

Here's the relevant excerpt in the spec:

The MLCommandEncoder interface created by the MLContext.createCommandEncoder() method supports a graph execution method that provides the maximum flexibility to callers that also utilize WebGPU in their application. It does this by placing the workload required to initialize and compute the results of the operations in the graph onto a GPUCommandBuffer. The callers are responsible for the eventual submission of this workload on the GPUQueue through the WebGPU queue submission mechanism. Once the submitted workload is completely executed, the result is avaialble in the bound output buffers.

bbernhar · 2022-05-19T20:55:54Z

computeAsync doesn't take GPUCommandBuffer as an input. It's not meant to. The command buffer produced by the MLCommandEncoder is only meant to be submitted to the GPUQueue by the caller.

I understand. But my point was the same ML resource could be referenced through a WebNN produced GPUCommandBuffer, executed by WebGPU AND used by computeAsync, which could execute with the same resource as input/output, through WebNN.

Since WebNN has no way of knowing what resource state WebGPU execution needs and vise-versa, WebNN computeAsync could mistakenly change the resource state behind WebGPU's back, before it executes.

A contract by interface isn't sufficient here, WebNN needs well-defined resource usage rules too.

huningxin · 2022-05-20T01:43:01Z

I understand. But my point was the same ML resource could be referenced through a WebNN produced GPUCommandBuffer, executed by WebGPU AND used by computeAsync, which could execute with the same resource as input/output, through WebNN.

In the latest version of the spec, the computeAsync method only takes ArrayBufferView resources while WebNN produced GPUCommandBuffer only references GPU resources. Would that design avoid this issue?

bbernhar · 2022-05-20T16:32:23Z

@huningxin

Unfortunately, no. WebGPU could put the GPU resource in a different state, for another GPUCommandBuffer, not created by WebNN, then execute WebNN's GPUCommandBuffer, using that GPU resource, now in the wrong state.

The problem is by sharing a GPUCommandBuffer, WebNN's resource usages need to defined in WebGPU and vise-versa, WebGPU needs to know WebNN's resource rules are. Or WebGPU is made exploitable through WebNN.

The only way I can think of solving this is by defining resource usage rules in WebNN, and introducing GPUExternalBuffer (notice: GPUExternalTexture already exists, same problem but for video).

wchao1115 · 2022-05-20T19:07:34Z

I think what Ningxin means by:

In the latest version of the spec, the computeAsync method only takes ArrayBufferView resources while WebNN produced GPUCommandBuffer only references GPU resources. Would that design avoid this issue?

In response to the issue you raised:

What happens when computeAsync executes using them?

is that computeAsync as it is now, does not support WebGPU interop and only accepts CPU inputs as ArrayBufferView.

He does not comment on the interop design, simply responds to what you raised as irrelevant to the interop discussion. That, by the way, is also my response earlier.

bbernhar · 2022-05-20T20:46:30Z

@wchao1115

WebNN cannot produce a command buffer without specifying the rules of resource usage and also remove itself from providing the responsibility (WebNN) to define AND enforce them somewhere. So if computeAsync does nothing of the sort, then WebGPU will happily stomp over it upon GPUQueue.Submit().

Edit: might help to have an example.

Create a GPU "uniform" buffer with WebGPU, read/write.
Use that GPU buffer, as "input", in a MLCommandEncoder.
WebNN needs the GPU "input" buffer to be read.
WebGPU does a GPU copy into the GPU "input" buffer, it must be write-only.
Call GPUQueue.Submit() using MLCommandEncoder.Finish().
Device lost.

huningxin · 2022-05-23T02:19:06Z

Thanks @wchao1115 for your comments. It clarifies my response regarding to computeAsync.

@bbernhar

The problem is by sharing a GPUCommandBuffer, WebNN's resource usages need to defined in WebGPU and vise-versa

I think we need to solve this problem.

The only way I can think of solving this is by defining resource usage rules in WebNN, and introducing GPUExternalBuffer (notice: GPUExternalTexture already exists, same problem but for video).

I am not sure whether we need to define GPUExternalBuffer because I don't know what WebNN object it can represent.

As my understanding, the computational workload of a compiled MLGraph that is recorded into a GPUCommandBuffer is more like a specialized GPUComputePipeline. So could we follow the resource usage rules of it? The related WebGPU spec reads as

The outputs correspond to buffer bindings with a type of "storage" and storageTexture bindings with a type of "write-only".

Besides the resource rules, we probably also need to clarify the tensor data layout in GPU resources, especially for GPUTexture. Or we may simply drop the support of GPUTexture and leave that to user compute shader code.

For example, in the WebNN/WebGPU video background blur sample, I implemented a WebGPU compute shader that preprocesses the input GPUTexture for data re-ordering and normalization (tensorization), and puts the result into a GPUBuffer which is used as input of following MLGraph execution for image segmentation.

bbernhar · 2022-05-23T17:16:03Z

@huningxin

We could "map" resource usages between WebGPU <=> WebNN (ex. input binding == R/O uniform) and also define new texture layouts (+conversions via a "fat shader") for WebNN.

But the underlying issue is still there, resource states need to be correctly synchronized between WebNN and WebGPU or they will stomp over each other.

Your example works because it does not execute GPU resources outside of WebGPU, it has internal-only usage. The shared video resource being imported into WebGPU, was made mutually exclusive upon ImportExternal. WebGPU "rents" the video texture from video but under the strict condition it is never a normal GPU texture.

a-sully · 2024-01-25T16:46:34Z

It's my understanding that this proposal has been superseded by MLBuffer (#482). Can we close this issue?

anssiko · 2024-01-25T16:52:20Z

@a-sully thanks for flagging this as a possibly stale issue.

@bbernhar feel free to close this if you concur and bring any active ideas to the actively discussed issue.

bbernhar · 2024-01-25T16:58:10Z

@a-sully SGTM.

bbernhar mentioned this issue May 2, 2022

Context-based graph execution methods for different threading models. #257

Merged

anssiko changed the title ~~Should MLCommandBuffer be MLExternalCommandBuffer?~~ CommandBuffer usage clarification: internal, external, both? May 18, 2022

huningxin mentioned this issue Jun 30, 2022

Integration with real-time video processing #226

Open

anssiko mentioned this issue Aug 12, 2022

Wide review tracker #239

Closed

25 tasks

zolkis mentioned this issue Jan 24, 2023

Clarify/simplify graph execution on GPU / MLCommandEncoder #333

Closed

anssiko added the question label Mar 3, 2023

a-sully mentioned this issue Jan 12, 2024

Support for device-based tensor storage objects #482

Closed

bbernhar closed this as completed Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CommandBuffer usage clarification: internal, external, both? #264

CommandBuffer usage clarification: internal, external, both? #264

bbernhar commented May 2, 2022 •

edited

Loading

anssiko commented May 12, 2022

bbernhar commented May 17, 2022 •

edited

Loading

anssiko commented May 18, 2022

wchao1115 commented May 19, 2022

anssiko commented May 19, 2022

bbernhar commented May 19, 2022

wchao1115 commented May 19, 2022

bbernhar commented May 19, 2022 •

edited

Loading

wchao1115 commented May 19, 2022

bbernhar commented May 19, 2022 •

edited

Loading

huningxin commented May 20, 2022

bbernhar commented May 20, 2022 •

edited

Loading

wchao1115 commented May 20, 2022

bbernhar commented May 20, 2022 •

edited

Loading

huningxin commented May 23, 2022

bbernhar commented May 23, 2022 •

edited

Loading

a-sully commented Jan 25, 2024

anssiko commented Jan 25, 2024

bbernhar commented Jan 25, 2024

CommandBuffer usage clarification: internal, external, both? #264

CommandBuffer usage clarification: internal, external, both? #264

Comments

bbernhar commented May 2, 2022 • edited Loading

anssiko commented May 12, 2022

bbernhar commented May 17, 2022 • edited Loading

anssiko commented May 18, 2022

wchao1115 commented May 19, 2022

anssiko commented May 19, 2022

bbernhar commented May 19, 2022

wchao1115 commented May 19, 2022

bbernhar commented May 19, 2022 • edited Loading

wchao1115 commented May 19, 2022

bbernhar commented May 19, 2022 • edited Loading

huningxin commented May 20, 2022

bbernhar commented May 20, 2022 • edited Loading

wchao1115 commented May 20, 2022

bbernhar commented May 20, 2022 • edited Loading

huningxin commented May 23, 2022

bbernhar commented May 23, 2022 • edited Loading

a-sully commented Jan 25, 2024

anssiko commented Jan 25, 2024

bbernhar commented Jan 25, 2024

bbernhar commented May 2, 2022 •

edited

Loading

bbernhar commented May 17, 2022 •

edited

Loading

bbernhar commented May 19, 2022 •

edited

Loading

bbernhar commented May 19, 2022 •

edited

Loading

bbernhar commented May 20, 2022 •

edited

Loading

bbernhar commented May 20, 2022 •

edited

Loading

bbernhar commented May 23, 2022 •

edited

Loading