API simplification: context types, context options, createContext() #302

zolkis · 2022-12-01T16:14:50Z

Lifted from #298 for brevity.

Proposal

Provide a single context type as a mapped descriptor for the combination of resources used in the context, e.g. a valid combination of device(s). (Somewhat analogous to the adapter plus device(s) concept in Web GPU.)

enum MLContextType {  
  "cpu",   // script-controlled context
  "gpu"   // script-controlled context
  "webgpu",  // managed by the user agent
  // later other context types may be defined, even using multiple devices, e.g. "cpu+npu" etc.
  // Note: in fact all these context types could be separate interface classes as well...
};

enum MLPowerPreference {  // a hint
  "default",
  "high-performance",
  "low-power"
};

dictionary MLContextOptions {  // not a hint
  MLContextType contextType = "cpu";
  MLPowerPreference powerPreference = "default";
  GPUDevice? gpuDevice = null;
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface ML {
  Promise<MLContext> createContext(optional MLContextOptions options);

  [Exposed=(DedicatedWorker)]
  MLContext createContextSync(optional MLContextOptions options);

  // Internal slots
  // [[boolean managed]]  // `true` if the user agent controls the context (not really needed)
  // [[MLContextType contextType]] 
  // [[MLPowerPreference powerPreference]]
  // [[implementation]] // perhaps "adapter" would be better

  // further methods (and eventually properties) will follow
};

Rationale for change

Including WebGPU into context options would allow a single definition of createContext(),
handle the current context types ("default" and "webgpu") and also the current device types ("cpu" and "gpu") and possibly other (and combined) device types in the future. That way, it would simplify and minimize algorithms.
This would simplify the issues mentioned in
Context-based graph execution methods for different threading models. #257,
Define graph execution methods used in different threading models. #255,
Add support for device selection #162.

Related to #303.

The text was updated successfully, but these errors were encountered:

anssiko · 2022-12-16T06:27:55Z

This is considered a v2 feature per https://www.w3.org/2022/12/15-webmachinelearning-minutes.html

zolkis · 2023-01-12T15:43:56Z

Following the proposals from #322 , adapted the proposal. Web IDL:

enum MLContextType {  
  "cpu",   // script-controlled context
  "webgpu",  // managed by the user agent
  // later other context types may be defined, even using multiple devices, e.g. "cpu+npu" etc.
  // Note: in fact all these context types could be separate interface classes as well...
};

enum MLPowerPreference {  // a hint
  "default",
  "low-power"
};

dictionary MLContextOptions {  // not a hint
  MLContextType contextType = "cpu";
  MLPowerPreference powerPreference = "default";
  GPUDevice? gpuDevice = null;
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface ML {
  Promise<MLContext> createContext(optional MLContextOptions options);

  [Exposed=(DedicatedWorker)]
  MLContext createContextSync(optional MLContextOptions options);

  // Internal slots
  // [[boolean managed]]  // `true` if the user agent controls the context (not really needed)
  // [[MLContextType contextType]] 
  // [[MLPowerPreference powerPreference]]
  // [[implementation]] // perhaps "adapter" would be better

  // further methods (and eventually properties) will follow
};

wchao1115 · 2023-01-13T23:25:00Z

Related to #322.

@zolkis I'm hesitated to fold gpuDevice into the MLContextOptions because the options settings may contain options that are not applicable to the specified GPU device e.g. MLPowerPreference is a hint used to help with locating the right hardware setting for the subsequent execution while a GPU device is already an outcome of the GPU adapter selection process (on the WebGPU side). By choosing to expose a dedicated createContext method for a gpuDevice we make it clear in the API that the backing device/adapter has already been previously selected and that WebNN plays no part in its selection.

Also note that from the implementation standpoint, a CPU-based and a GPU-based implementation are different in significant ways. Providing an impression at the API level that they are alike could be a bit misleading.

zolkis · 2023-01-16T11:32:35Z

@wchao1115 OK, that is fine. In the spec we anyway need to distinguish between GPU, CPU etc and write separate algorithms. So I can see why separate declarations make sense. It's a design choice and you presented a rationale, which can go logged in the explainer and then I can close this issue.

One note about future extensions, though: with this design we'll need to add new objects for adapters for the various future accelerators. With the design proposed here we'd just extend MLContextType / MLContextOptions, which I think is a cleaner and more web'ish approach.

inexorabletash · 2024-02-22T22:15:34Z

Picking up from discussion in #322

Maybe a good way to reboot this discussion is enumerating use cases for specifying (or not specifying) devices:

Delegate everything to the UA/OS, trust them to make the right trade-offs
- This should be the default, and not pre-suppose where execution occurs
Introp with GPU, e.g. with optimized kernels or as part of a graphics pipeline
- Handled today by passing a specific GPUDevice in; @wchao1115 outlines here and in Simplify MLContext creation #322 why this is the right approach (i.e. all resources are going to need to be associated with that device, etc)
Specify hints/preferences which guide the UA/OS to selecting the right device(s); these include at least
- power vs. performance (i.e. today's MLPowerPreference enum)
- accuracy vs. performance? (would need a new enum, if we wanted this)
- more?
Coordinating ML and other workload across devices
- Apple's Core ML has MLComputeUnits which allows specifying any combination of CPU, GPU, or ANE. But notably its guidance is: "Use all to allow the OS to select the best processing unit to use (including the neural engine, if available). Use MLComputeUnits.cpuOnly to restrict the model to the CPU, if your app might run in the background or runs other GPU intensive tasks." That matches what we (Google Chrome) have heard from multiple teams: that one advantage of a TPU/NPU/ANE (etc) is that work can be offloaded from the GPU. So the use case is not "please use CPU and TPU" it's "please don't use GPU" instead!

IMHO, the last use case is the only one where something like the current MLDeviceType is plausible, but it's not a good fit as-is. I'm not sure how to express (4) well - pass the GPUDevice to avoid?!?!

I'd prefer to remove the deviceType option now (as was proposed in #322), and consider additional enums/values (per 3) as use cases are confirmed, and explore how to express (4) separately, to avoid baking in the current design.

huningxin · 2024-02-23T02:23:12Z

@inexorabletash , thanks for revisiting this issue!

Coordinating ML and other workload across device

+1 to explore this use case. I know there are interests from platform vendors in scheduling the workloads across multiple devices.

I'd prefer to remove the deviceType option now

Before we have alternative solution, I concern removing deviceType options would prevent us from testing and comparing CPU and GPU executions in current Chromium prototype. We may also want to prototype the NPU execution by extending deviceType with "npu" value in the near future.

In #322, we also discussed whether to introduce "system" device type, that would allow use case (1) and "all" devices case of (4) .

For other cases of (4), maybe we can introduce similar combinations of MLComputeUnits, something like "cpu_and_gpu", "cpu_and_npu"?

Another idea is to change deviceType to bit values, then developers can combine them, for example MLDeviceType.CPU | MLDeviceType.NPU.

inexorabletash · 2024-02-23T17:17:36Z

Before we have alternative solution, I concern removing deviceType options would prevent us from testing and comparing CPU and GPU executions in current Chromium prototype.

Fair point. We may want to flag in the spec (with an Issue: ... banner pointing here) that this part of the API is in flux. While that's technically true for the spec generally, it could help reviewers focus their attention.

We may also want to prototype the NPU execution by extending deviceType with "npu" value in the near future.

Now we just have to pick a neutral term. 😉

wchao1115 · 2024-03-03T06:56:32Z

@inexorabletash

The "delegating it to the OS" approach sounds good until you realize that the OS can make mistakes you cannot correct so be careful what you wish for. More often than not, the OS doesn't have enough context to make a decision on behalf of the app, and if it has to make one, it will have no idea that the decision it makes now is consistent with what it did in the past. In fact, the whole notion of "hints" was invented as a way to soften the expectation from the app (in case a mistake occurs). Usually the most robust approach, and the one with the highest degree of predictability and forward compatibility to the user is one that puts the decision in the hands of the app (or sometimes the framework), with the OS helping to carry out that decision. In our case, it calls for an explicit device type on WebNN.
Simplify MLContext creation #322 wants to streamline the GPU usage in WebNN by saying that only a WebGPU context owns the GPU device, to avoid a situation where both the WebGPU and WebNN each owns its own device (which could both point to the same underlying physical adapter, or to two different adapters in the same machine e.g. one integrated another discrete, etc). I still think it's the right approach to not allow WebNN to hide another GPU device under the hood without the app's or WebGPU's knowledge. It makes predictable cross-adapter resource sharing very hard to achieve.
A device type npu looks to be a good abstraction until you realize that it is fundamentally different from a gpu or cpu device type in that while both the GPU and CPU can execute any arbitrary operator (current or future) due to its programmability property, the NPU can't as they are mostly non-programmable i.e. it supports only a certain set of operators and even with specific variants of such. There are essentially two ways to deal with this difference -- assume that any operator can fail with an unsupported failure, or pair an NPU up with a programmable fallback device (either a GPU or CPU). The former will likely increase the complexity for graph compilation and make it much harder for the app to successfully "use" the NPU, while the latter shifts that complexity downward to the implementation of WebNN, either in the backend or in the OS. I actually prefer the latter as it's likely to produce a better result with higher efficiency at a cost of better predictability.

inexorabletash · 2024-03-05T20:10:58Z

Thanks for explaining further @wchao1115 - to ensure I'm understanding, let me attempt to restate the proposal: if you want WebNN to use GPU, you need to pass a GPUDevice, and otherwise WebNN should avoid using a system's GPU; the alternative is surprising behavior.

Re: (3) and "pairing up" an NPU with a programmable fallback device - again, thank you for walking through this. Our current understanding is that DML either uses GPU or NPU and doesn't contain a CPU fallback path. In contrast, TFLite and CoreML will fall back to CPU if a GPU or NPU delegate can't cover an op. How do envision a WebNN implementation with a DML backend supporting this fallback? For DML would the fallback become the UA's responsibility?

(Also, please don't read my comments/questions as arguing for an alternative approach - just trying to understand!)

wchao1115 · 2024-03-06T22:39:44Z

Thank you @inexorabletash for your comment. Yes, re: #322. Setting the gpu enum value for the device type replicates the WebGPU workflow on device creation, its lifetime and resource domain e.g. how to deal with device removed, etc. Not starting another GPU life cycle from within a WebNN context seems like a right thing to do.

CPU fallback could be highly expensive when it involves breaking up a fully pipelined GPU resources and drawing down a sizeable amount of data from the GPU memory to the DDR system memory with lower memory bandwidth. This overhead could be worse than just running the whole graph on the CPU in many cases, especially those that pull the data across the memory bus. This cost is far less prominent in a system with unified memory architecture where all the memories are shared and equally accessible to all integrated processors.

DML today doesn't support CPU fallback on its own and still relying on the framework to do so, although it is technically possible through a WARP device. For WebNN, there is a dedicated CPU backend as we know, in addition to the DML backend. The pairing between an NPU and an integrated processor, whether it's an integrated GPU or a CPU should avoid the high cost of pipeline switching and memory readback while still providing a decent runtime experience to the users when fallback is needed.

wchao1115 · 2024-03-07T19:35:47Z

BTW, @RafaelCintron just gave me some relevant bits of info that WebGPU currently still lacks support for discrete GPU. I think it brings up a good point about the inherent risk of taking a dependency on an external spec i.e. there is a chance that #322 could be a logically correct but physically wrong design when we consider that any limitation or constraint on the GPU device lifetime on the WebGPU side will also impose the same limitation on WebNN. That is an argument for keeping the current design of having the "gpu" device type and not taking up #322.

We could also add a new device type "npu" along with a new fallback device type with the supported values of { "gpu", "none" }, where "none" would mean "no fallback needed" i.e. compile just fail.

zolkis mentioned this issue Dec 1, 2022

API simplification: context owns builder, graph becomes internal slot #303

Closed

anssiko added the v2 label Dec 16, 2022

anssiko mentioned this issue Feb 9, 2023

Use modern WebIDL and Infra standard conventions #210

Closed

dontcallmedom added enhancement editorial and removed enhancement editorial labels Mar 3, 2023

zolkis mentioned this issue Feb 15, 2024

Can an MLGraphBuilder be reused? #567

Closed

wchao1115 mentioned this issue Mar 27, 2024

WebNN should support NPU and QDQ operations #623

Open

inexorabletash added the device selection label Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API simplification: context types, context options, createContext() #302

API simplification: context types, context options, createContext() #302

zolkis commented Dec 1, 2022 •

edited

Loading

anssiko commented Dec 16, 2022

zolkis commented Jan 12, 2023

wchao1115 commented Jan 13, 2023

zolkis commented Jan 16, 2023

inexorabletash commented Feb 22, 2024 •

edited

Loading

huningxin commented Feb 23, 2024

inexorabletash commented Feb 23, 2024

wchao1115 commented Mar 3, 2024 •

edited

Loading

inexorabletash commented Mar 5, 2024

wchao1115 commented Mar 6, 2024 •

edited

Loading

wchao1115 commented Mar 7, 2024 •

edited

Loading

API simplification: context types, context options, createContext() #302

API simplification: context types, context options, createContext() #302

Comments

zolkis commented Dec 1, 2022 • edited Loading

Proposal

Rationale for change

anssiko commented Dec 16, 2022

zolkis commented Jan 12, 2023

wchao1115 commented Jan 13, 2023

zolkis commented Jan 16, 2023

inexorabletash commented Feb 22, 2024 • edited Loading

huningxin commented Feb 23, 2024

inexorabletash commented Feb 23, 2024

wchao1115 commented Mar 3, 2024 • edited Loading

inexorabletash commented Mar 5, 2024

wchao1115 commented Mar 6, 2024 • edited Loading

wchao1115 commented Mar 7, 2024 • edited Loading

zolkis commented Dec 1, 2022 •

edited

Loading

inexorabletash commented Feb 22, 2024 •

edited

Loading

wchao1115 commented Mar 3, 2024 •

edited

Loading

wchao1115 commented Mar 6, 2024 •

edited

Loading

wchao1115 commented Mar 7, 2024 •

edited

Loading