Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should restrict the sync APIs to only exist in Workers? #229

Closed
huningxin opened this issue Nov 16, 2021 · 11 comments
Closed

Should restrict the sync APIs to only exist in Workers? #229

huningxin opened this issue Nov 16, 2021 · 11 comments

Comments

@huningxin
Copy link
Contributor

The WebNN graph building (MLGraphBuilder.build) and execution (MLGraph.compute) are sync APIs. This is required to implement a backend for Wasm-based ML frameworks, such as ONNX Runtime Execution Provider, TensorFlow Lite Delegate and OpenCV.js DNN backend. These frameworks are written in C++ and expect calling synchronous APIs in its backend implementation. To avoid blocking the main thread, the good practice is to call these synchronous APIs in a worker context.

The sync APIs are now exposed in both Window and DedicatedWorker. Should WebNN spec restrict the sync APIs to only exist in workers?

/cc @jyasskin

@dontcallmedom
Copy link
Contributor

see also w3ctag/design-principles#325

@domenic
Copy link

domenic commented Dec 1, 2021

+1 to only exposing sync APIs in workers; that is the strong direction we've taken for other recent APIs such as file access handles, etc. From a priority of constituencies point of view, protecting the user from main-thread jank via developers using such APIs is a very high priority.

@RafaelCintron
Copy link
Collaborator

RafaelCintron commented Dec 17, 2021

@domenic and @dontcallmedom Generally speaking, even though APIs like WebGL and WebGPU expose "sync" sounding functions like drawIndexed, drawElements and dispatch, those do not perform any meaningful work on the main Javascript thread. Instead, they queue a command to the GPU process to do the drawing. Even when the commands arrive in the GPU process, they're queued (again) to the GPU which does the actual drawing or dispatching.

Similarly for WebNN, the "sync" sounding compute method doesn't actually do computing when the API runs with GPU contexts. If a web developer wants to read back the results of the WebNN computation to the CPU, there already exists promise based APIs on WebGPU where they can do so without wedging the main Javascript thread.

When WebNN runs on the CPU and all inputs are ArrayBuffers, I think it makes to force the API to run in a worker.

@dontcallmedom
Copy link
Contributor

@RafaelCintron the definition of MLGraph.compute in the spec reads:

Return once the compute has completed and the results in MLNamedOutputs are ready to be consumed.

Since the function only returns after the work has happened (independently of where the work actually happens), it blocks any main thread processing as far as I understand.

There is no clear algorithmic definition of how MLGraphBuilder.build() operates at the moment, so it's hard to evaluate how sync vs async applies to it.

@RafaelCintron
Copy link
Collaborator

@dontcallmedom perhaps we need to reword the definition of MLGraph.compute, then. When this runs on the GPU, calling compute does not, by itself, perform any computation beyond enqueuing a command to the GPU process to perform the actual computation on the GPU timeline. We can safely make this one "synchronous" from the perspective of the main JS thread. All of the draw commands for WebGL and WebGPU also work in this manner.

I agree MLGraphBuilder.build is more ambigous. If, in practice, building a graph requires a substantial amount of validation such that it would affect responsiveness of the main JS thread, we should make it asynchronous.

@dontcallmedom
Copy link
Contributor

perhaps we need to reword the definition of MLGraph.compute, then. When this runs on the GPU, calling compute does not, by itself, perform any computation beyond enqueuing a command to the GPU process to perform the actual computation on the GPU timeline.

I understand this, but I still don't understand how one would determine that the computation is complete with a purely non-blocking synchronous call. A major distinction with WebGL draw commands is that there is no result to be awaited there. From what I can see, WebGPU has a queue system that can be asynchronously monitored to detect when enqueued operations have been executed.

There are 2 distinct aspects to consider in the sync/async discussions:

  • avoiding to run CPU-intensive operations on the main thread - not a concern here at least in cases where the operations are not done on the CPU
  • avoiding to block the main thread while awaiting the result of a non-main-thread operation is running

I don't see how we can avoid the latter with purely synchronous calls running on the main thread.

@RafaelCintron
Copy link
Collaborator

RafaelCintron commented Jan 5, 2022

A major distinction with WebGL draw commands is that there is no result to be awaited there. From what I can see, WebGPU has a queue system that can be asynchronously monitored to detect when enqueued operations have been executed.

@dontcallmedom when you're using WebNN in conjunction with WebGPU, the only way to get results to the CPU is to use WebGPU's asynchronous buffer mapping APIs. See 3.5.1. CPU-GPU Ownership Transfer from the WebGPU explainer.

Not all WebNN scenarios require readback to the CPU, however. If you're writing a teleconferencing website where you want to do background blur or "funny hats" , the ML operation is one of a number of GPU operations you want to perform in a pipeline. The WebNN and WebGPU commands should all happen in the same JS callback. Forcing each step in the pipeline to be an async JS call would add latency and deliver a suboptimal experience.

@anssiko
Copy link
Member

anssiko commented Jan 20, 2022

@pyu10055 @EmmaNingMS, we discussed in https://www.w3.org/2022/01/13-webmachinelearning-minutes.html#t03 whether the WebNN API should restrict the sync graph building (MLGraphBuilder.build) and execution (MLGraph.compute) APIs to workers. These APIs are currently exposed to both the main thread and worker context.

Before the WG makes a decision, we wanted to hear from you, ML framework authors, how this proposed change would impact your frameworks, in particular their Wasm backends. We want to make sure the solution works for you. It might be it warrants changes to other Web APIs, and we're happy to hear that feedback as well and bring it forward to the right folks.

Please let us know if you have any questions. We'll revisit this topic on our future call once we've received your feedback on the issue.

@wchao1115
Copy link
Collaborator

perhaps we need to reword the definition of MLGraph.compute, then. When this runs on the GPU, calling compute does not, by itself, perform any computation beyond enqueuing a command to the GPU process to perform the actual computation on the GPU timeline.

I don't think that's the intent of MLGraph.compute. Compute on the GPU should record the dispatches on the command list, flush the command list to the command queue, wait till the GPU finishes executing all the dispatches in the queue with the compute result on the bound output buffers, then optionally read the result back from the GPU resources if needed.

@wchao1115
Copy link
Collaborator

This is fixed by PR #257.

@anssiko
Copy link
Member

anssiko commented Jun 16, 2022

Closing with a note that async context creation is discussed in its own issue #272.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants