Should restrict the sync APIs to only exist in Workers? #229

huningxin · 2021-11-16T08:01:53Z

The WebNN graph building (MLGraphBuilder.build) and execution (MLGraph.compute) are sync APIs. This is required to implement a backend for Wasm-based ML frameworks, such as ONNX Runtime Execution Provider, TensorFlow Lite Delegate and OpenCV.js DNN backend. These frameworks are written in C++ and expect calling synchronous APIs in its backend implementation. To avoid blocking the main thread, the good practice is to call these synchronous APIs in a worker context.

The sync APIs are now exposed in both Window and DedicatedWorker. Should WebNN spec restrict the sync APIs to only exist in workers?

/cc @jyasskin

dontcallmedom · 2021-11-16T09:01:32Z

see also w3ctag/design-principles#325

domenic · 2021-12-01T15:26:04Z

+1 to only exposing sync APIs in workers; that is the strong direction we've taken for other recent APIs such as file access handles, etc. From a priority of constituencies point of view, protecting the user from main-thread jank via developers using such APIs is a very high priority.

RafaelCintron · 2021-12-17T20:46:46Z

@domenic and @dontcallmedom Generally speaking, even though APIs like WebGL and WebGPU expose "sync" sounding functions like drawIndexed, drawElements and dispatch, those do not perform any meaningful work on the main Javascript thread. Instead, they queue a command to the GPU process to do the drawing. Even when the commands arrive in the GPU process, they're queued (again) to the GPU which does the actual drawing or dispatching.

Similarly for WebNN, the "sync" sounding compute method doesn't actually do computing when the API runs with GPU contexts. If a web developer wants to read back the results of the WebNN computation to the CPU, there already exists promise based APIs on WebGPU where they can do so without wedging the main Javascript thread.

When WebNN runs on the CPU and all inputs are ArrayBuffers, I think it makes to force the API to run in a worker.

dontcallmedom · 2022-01-04T07:58:27Z

@RafaelCintron the definition of MLGraph.compute in the spec reads:

Return once the compute has completed and the results in MLNamedOutputs are ready to be consumed.

Since the function only returns after the work has happened (independently of where the work actually happens), it blocks any main thread processing as far as I understand.

There is no clear algorithmic definition of how MLGraphBuilder.build() operates at the moment, so it's hard to evaluate how sync vs async applies to it.

RafaelCintron · 2022-01-05T01:44:00Z

@dontcallmedom perhaps we need to reword the definition of MLGraph.compute, then. When this runs on the GPU, calling compute does not, by itself, perform any computation beyond enqueuing a command to the GPU process to perform the actual computation on the GPU timeline. We can safely make this one "synchronous" from the perspective of the main JS thread. All of the draw commands for WebGL and WebGPU also work in this manner.

I agree MLGraphBuilder.build is more ambigous. If, in practice, building a graph requires a substantial amount of validation such that it would affect responsiveness of the main JS thread, we should make it asynchronous.

dontcallmedom · 2022-01-05T08:52:26Z

perhaps we need to reword the definition of MLGraph.compute, then. When this runs on the GPU, calling compute does not, by itself, perform any computation beyond enqueuing a command to the GPU process to perform the actual computation on the GPU timeline.

I understand this, but I still don't understand how one would determine that the computation is complete with a purely non-blocking synchronous call. A major distinction with WebGL draw commands is that there is no result to be awaited there. From what I can see, WebGPU has a queue system that can be asynchronously monitored to detect when enqueued operations have been executed.

There are 2 distinct aspects to consider in the sync/async discussions:

avoiding to run CPU-intensive operations on the main thread - not a concern here at least in cases where the operations are not done on the CPU
avoiding to block the main thread while awaiting the result of a non-main-thread operation is running

I don't see how we can avoid the latter with purely synchronous calls running on the main thread.

RafaelCintron · 2022-01-05T19:07:58Z

A major distinction with WebGL draw commands is that there is no result to be awaited there. From what I can see, WebGPU has a queue system that can be asynchronously monitored to detect when enqueued operations have been executed.

@dontcallmedom when you're using WebNN in conjunction with WebGPU, the only way to get results to the CPU is to use WebGPU's asynchronous buffer mapping APIs. See 3.5.1. CPU-GPU Ownership Transfer from the WebGPU explainer.

Not all WebNN scenarios require readback to the CPU, however. If you're writing a teleconferencing website where you want to do background blur or "funny hats" , the ML operation is one of a number of GPU operations you want to perform in a pipeline. The WebNN and WebGPU commands should all happen in the same JS callback. Forcing each step in the pipeline to be an async JS call would add latency and deliver a suboptimal experience.

anssiko · 2022-01-20T11:39:55Z

@pyu10055 @EmmaNingMS, we discussed in https://www.w3.org/2022/01/13-webmachinelearning-minutes.html#t03 whether the WebNN API should restrict the sync graph building (MLGraphBuilder.build) and execution (MLGraph.compute) APIs to workers. These APIs are currently exposed to both the main thread and worker context.

Before the WG makes a decision, we wanted to hear from you, ML framework authors, how this proposed change would impact your frameworks, in particular their Wasm backends. We want to make sure the solution works for you. It might be it warrants changes to other Web APIs, and we're happy to hear that feedback as well and bring it forward to the right folks.

Please let us know if you have any questions. We'll revisit this topic on our future call once we've received your feedback on the issue.

wchao1115 · 2022-01-26T05:12:13Z

perhaps we need to reword the definition of MLGraph.compute, then. When this runs on the GPU, calling compute does not, by itself, perform any computation beyond enqueuing a command to the GPU process to perform the actual computation on the GPU timeline.

I don't think that's the intent of MLGraph.compute. Compute on the GPU should record the dispatches on the command list, flush the command list to the command queue, wait till the GPU finishes executing all the dispatches in the queue with the compute result on the bound output buffers, then optionally read the result back from the GPU resources if needed.

wchao1115 · 2022-06-02T05:01:49Z

This is fixed by PR #257.

anssiko · 2022-06-16T10:19:05Z

Closing with a note that async context creation is discussed in its own issue #272.

huningxin mentioned this issue Nov 18, 2021

Should WebNN support async APIs? #230

Closed

dontcallmedom mentioned this issue Dec 8, 2021

Guidance about exposing on Window vs Worker only (and other contexts) w3ctag/design-principles#325

Closed

wchao1115 mentioned this issue Jan 13, 2022

Candidate Recommendation readiness tracker #240

Closed

8 tasks

wchao1115 added the cr label Jan 13, 2022

dontcallmedom mentioned this issue Jan 28, 2022

Guidance in exposing some APIs only off the main thread w3ctag/design-principles#360

Closed

huningxin mentioned this issue Mar 23, 2022

Integration with real-time video processing #226

Open

huningxin mentioned this issue May 6, 2022

Support asynchronous graph compilation #263

Closed

huningxin mentioned this issue Jun 9, 2022

Support asynchronous context creation #272

Closed

anssiko closed this as completed Jun 16, 2022

anssiko mentioned this issue Sep 9, 2022

Delta review (to CR) of Web Neural Network API w3ctag/design-reviews#771

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should restrict the sync APIs to only exist in Workers? #229

Should restrict the sync APIs to only exist in Workers? #229

huningxin commented Nov 16, 2021

dontcallmedom commented Nov 16, 2021

domenic commented Dec 1, 2021

RafaelCintron commented Dec 17, 2021 •

edited

Loading

dontcallmedom commented Jan 4, 2022

RafaelCintron commented Jan 5, 2022

dontcallmedom commented Jan 5, 2022

RafaelCintron commented Jan 5, 2022 •

edited

Loading

anssiko commented Jan 20, 2022

wchao1115 commented Jan 26, 2022

wchao1115 commented Jun 2, 2022

anssiko commented Jun 16, 2022

Should restrict the sync APIs to only exist in Workers? #229

Should restrict the sync APIs to only exist in Workers? #229

Comments

huningxin commented Nov 16, 2021

dontcallmedom commented Nov 16, 2021

domenic commented Dec 1, 2021

RafaelCintron commented Dec 17, 2021 • edited Loading

dontcallmedom commented Jan 4, 2022

RafaelCintron commented Jan 5, 2022

dontcallmedom commented Jan 5, 2022

RafaelCintron commented Jan 5, 2022 • edited Loading

anssiko commented Jan 20, 2022

wchao1115 commented Jan 26, 2022

wchao1115 commented Jun 2, 2022

anssiko commented Jun 16, 2022

RafaelCintron commented Dec 17, 2021 •

edited

Loading

RafaelCintron commented Jan 5, 2022 •

edited

Loading