[MLBuffer] Support interop with WebGPU #688

bbernhar · 2024-05-15T17:44:18Z

Purpose/Motivation

WebNN lacks an API to import video frames or run custom ML ops. ML use-cases like semantic segmentation or real-time video could benefit by avoiding JS copies. WebGPU also lacks NPU support which means use-cases like Super Resolution cannot be further accelerated by WebNN ML ops. This is a sub-issue of #482.

Proposed Solution: direct buffer sharing

Export WebNN's MLBuffer datatype, import it into WebGPU as as standard GPUBuffer, which can be directly bound in a WGSL compute shader. Any needed conversions and synchronization for the shared buffer is performed by the WebNN runtime.

After MLBuffer is imported, it is considered "neutered". A validation error is generated if used by the WebNN context.
GPUBuffer, created upon import, could be a copy of a MLBuffer contents.
After GPUBuffer.Destroy(), the MLBuffer is no longer "neutered" and will un-expire to be re-used again.

JS example

wgpuDevice = /* create GPU device from adapter */ 
ml_context = ML.createContext(wgpuDevice);
ml_buffer = ml_context.createBuffer(/* create MLOperandDescriptor */, usage: MLBufferUsage.WEBGPU_INTEROP);

// Import buffer to WebGPU (name TBD)
// Assumed WGPU usages = GPUBufferUsageFlags.STORAGE | GPUBufferUsageFlags.COPY_SRC.
gpu_buffer = wgpuDevice.importExternalBuffer(ml_buffer);

// ... compute using `gpu_buffer`...
wgpuDevice.queue.submit([command_encoder.finish()]);

// Export buffer to WebNN
gpu_buffer.Destroy();

// Re-use MLBuffer in WebNN
ml_context.dispatch(inputs, {output: ml_buffer});

// Re-import the buffer to use it again. 
gpu_buffer = wgpuDevice.importExternalBuffer(ml_buffer);
// ... render using `gpu_buffer`...

FAQ

What happens if the web developer never calls GPUBuffer.Destroy()?

If an imported MLBuffer is dropped, without being destroyed, the imported GPUBuffer object will stay alive until it is also dropped.

Why is there explicit handoff between WebGPU and WebNN?

Ensures MLBuffer cannot be modified by WebNN once imported (https://www.w3.org/TR/webgpu/#programming-model-resource-usages) and performs any necessary copies of its contents.

What are the synchronization guarantees between WebGPU's command queue and MLGraph?

WebNN ensures API access of MLBuffer will be mutually exclusive (no simultaneous access). MLGraph operations using MLBuffer are submitted for execution before WebNN waits for completion of work by WebGPU's queues. Similarly, WebGPU queues cannot execute or must wait until WebNN operations are completed.

Why not import `MLBuffer` as `GPUExternalTexture`?

Unlike textures, MLBuffer cannot be GPU sampled which restricts WebGPU shaders from performing color-space mapping and may require tensor-to-video format conversion. Since an imported MLBuffer layout will match GPUBuffer, a linear layout, the web developer could use it as a GPUBuffer.

Can you interop with mixed WebNN and WebGPU devices?

Currently, it is out of scope of v1. Only the same GPUDevice used to create MLContext can be used. In the future, explicit importExternalBuffer() could be added to MLContext or if zero-copy is disallowed.

The text was updated successfully, but these errors were encountered:

RafaelCintron · 2024-06-11T23:02:37Z

After MLBuffer is imported, it is considered "neutered". A validation error is generated if used by the WebNN context.
After GPUBuffer.Destroy(), the MLBuffer is no longer "neutered" and will un-expire to be re-used again.

"Neutering" is an existing concept that's part of the Transferrable Object section of the HTML 5 spec. In that spec, once you neuter an object, you can't bring it back like you described above. We need to come up with a different term to describe what we're doing. The term "renting" comes to mind but hopefully we can come up with a better one.

If an imported MLBuffer is dropped, without being destroyed, the imported GPUBuffer object will stay alive until it is also dropped.

I presume you by "dropped", you mean garbage collected?

WebNN ensures command queue access of MLBuffer(s) will be mutually exclusive (no simultaneous access). MLGraph operations using MLBuffer are submitted to the queue before WebNN waits for completion. Similarly, WebGPU is blocked by WebNN operations until WebNN operations are completed.

WebGPU today has one queue but in the future, it might have more than one. Since we haven't (yet) speced the concept of a "queue" in WebNN, your description is confusing because the reader doesn't know which queue you're referring to.

Should reword this to say that, upon import, WebNN operations on MLBuffers in flight on the WebNN device timeline will complete before work queued on WebGPU queue commences. Similarly, on GPUBuffer destruction, work on WebGPU queues will complete before the WebNN device timeline reads the results of the operation. In other words, upon import WebGPU reads will see previous WebNN writes. Upon GPUBuffer destruction, WebNN reads will see previous WebGPU writes.

bbernhar · 2024-06-12T00:18:13Z

Thanks @RafaelCintron Makes sense.

"Neutering" is an existing concept that's part of the Transferrable Object section of the HTML 5 spec.

OK. Perhaps expire and un-expire?

I presume you by "dropped", you mean garbage collected?

Yes, thanks for pointing this out.

Should reword this to say that, upon import, WebNN operations on MLBuffers in flight on the WebNN device timeline will complete before work queued on WebGPU queue commences.

WebNN does not define what a device timeline is nor does it distinguish what operations can be enqueued on it. If you can comment on #529 then the actual interop spec will follow that.

bbernhar · 2024-09-23T17:15:10Z

Closing as this proposal has been superseded by #754

inexorabletash added the webgpu interop label May 15, 2024

a-sully mentioned this issue Aug 6, 2024

[MLBuffer] Creation and representing MLBuffer on a XPU devices #542

Open

a-sully mentioned this issue Sep 11, 2024

Add MLTensor explainer #754

Merged

bbernhar closed this as completed Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLBuffer] Support interop with WebGPU #688

[MLBuffer] Support interop with WebGPU #688

bbernhar commented May 15, 2024 •

edited

Loading

RafaelCintron commented Jun 11, 2024

bbernhar commented Jun 12, 2024

bbernhar commented Sep 23, 2024

[MLBuffer] Support interop with WebGPU #688

[MLBuffer] Support interop with WebGPU #688

Comments

bbernhar commented May 15, 2024 • edited Loading

Purpose/Motivation

Proposed Solution: direct buffer sharing

JS example

FAQ

What happens if the web developer never calls GPUBuffer.Destroy()?

Why is there explicit handoff between WebGPU and WebNN?

What are the synchronization guarantees between WebGPU's command queue and MLGraph?

Why not import MLBuffer as GPUExternalTexture?

Can you interop with mixed WebNN and WebGPU devices?

RafaelCintron commented Jun 11, 2024

bbernhar commented Jun 12, 2024

bbernhar commented Sep 23, 2024

bbernhar commented May 15, 2024 •

edited

Loading

Why not import `MLBuffer` as `GPUExternalTexture`?