Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API simplification: context types, context options, createContext() #302

Open
zolkis opened this issue Dec 1, 2022 · 11 comments
Open

API simplification: context types, context options, createContext() #302

zolkis opened this issue Dec 1, 2022 · 11 comments

Comments

@zolkis
Copy link
Collaborator

zolkis commented Dec 1, 2022

Lifted from #298 for brevity.

Proposal

Provide a single context type as a mapped descriptor for the combination of resources used in the context, e.g. a valid combination of device(s). (Somewhat analogous to the adapter plus device(s) concept in Web GPU.)

enum MLContextType {  
  "cpu",   // script-controlled context
  "gpu"   // script-controlled context
  "webgpu",  // managed by the user agent
  // later other context types may be defined, even using multiple devices, e.g. "cpu+npu" etc.
  // Note: in fact all these context types could be separate interface classes as well...
};

enum MLPowerPreference {  // a hint
  "default",
  "high-performance",
  "low-power"
};

dictionary MLContextOptions {  // not a hint
  MLContextType contextType = "cpu";
  MLPowerPreference powerPreference = "default";
  GPUDevice? gpuDevice = null;
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface ML {
  Promise<MLContext> createContext(optional MLContextOptions options);

  [Exposed=(DedicatedWorker)]
  MLContext createContextSync(optional MLContextOptions options);

  // Internal slots
  // [[boolean managed]]  // `true` if the user agent controls the context (not really needed)
  // [[MLContextType contextType]] 
  // [[MLPowerPreference powerPreference]]
  // [[implementation]] // perhaps "adapter" would be better

  // further methods (and eventually properties) will follow
};

Rationale for change

Related to #303.

@anssiko
Copy link
Member

anssiko commented Dec 16, 2022

This is considered a v2 feature per https://www.w3.org/2022/12/15-webmachinelearning-minutes.html

@zolkis
Copy link
Collaborator Author

zolkis commented Jan 12, 2023

Following the proposals from #322 , adapted the proposal. Web IDL:

enum MLContextType {  
  "cpu",   // script-controlled context
  "webgpu",  // managed by the user agent
  // later other context types may be defined, even using multiple devices, e.g. "cpu+npu" etc.
  // Note: in fact all these context types could be separate interface classes as well...
};

enum MLPowerPreference {  // a hint
  "default",
  "low-power"
};

dictionary MLContextOptions {  // not a hint
  MLContextType contextType = "cpu";
  MLPowerPreference powerPreference = "default";
  GPUDevice? gpuDevice = null;
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface ML {
  Promise<MLContext> createContext(optional MLContextOptions options);

  [Exposed=(DedicatedWorker)]
  MLContext createContextSync(optional MLContextOptions options);

  // Internal slots
  // [[boolean managed]]  // `true` if the user agent controls the context (not really needed)
  // [[MLContextType contextType]] 
  // [[MLPowerPreference powerPreference]]
  // [[implementation]] // perhaps "adapter" would be better

  // further methods (and eventually properties) will follow
};

@wchao1115
Copy link
Collaborator

Related to #322.

@zolkis I'm hesitated to fold gpuDevice into the MLContextOptions because the options settings may contain options that are not applicable to the specified GPU device e.g. MLPowerPreference is a hint used to help with locating the right hardware setting for the subsequent execution while a GPU device is already an outcome of the GPU adapter selection process (on the WebGPU side). By choosing to expose a dedicated createContext method for a gpuDevice we make it clear in the API that the backing device/adapter has already been previously selected and that WebNN plays no part in its selection.

Also note that from the implementation standpoint, a CPU-based and a GPU-based implementation are different in significant ways. Providing an impression at the API level that they are alike could be a bit misleading.

@zolkis
Copy link
Collaborator Author

zolkis commented Jan 16, 2023

@wchao1115 OK, that is fine. In the spec we anyway need to distinguish between GPU, CPU etc and write separate algorithms. So I can see why separate declarations make sense. It's a design choice and you presented a rationale, which can go logged in the explainer and then I can close this issue.

One note about future extensions, though: with this design we'll need to add new objects for adapters for the various future accelerators. With the design proposed here we'd just extend MLContextType / MLContextOptions, which I think is a cleaner and more web'ish approach.

@inexorabletash
Copy link
Member

inexorabletash commented Feb 22, 2024

Picking up from discussion in #322

Maybe a good way to reboot this discussion is enumerating use cases for specifying (or not specifying) devices:

  1. Delegate everything to the UA/OS, trust them to make the right trade-offs
    • This should be the default, and not pre-suppose where execution occurs
  2. Introp with GPU, e.g. with optimized kernels or as part of a graphics pipeline
    • Handled today by passing a specific GPUDevice in; @wchao1115 outlines here and in Simplify MLContext creation #322 why this is the right approach (i.e. all resources are going to need to be associated with that device, etc)
  3. Specify hints/preferences which guide the UA/OS to selecting the right device(s); these include at least
    • power vs. performance (i.e. today's MLPowerPreference enum)
    • accuracy vs. performance? (would need a new enum, if we wanted this)
    • more?
  4. Coordinating ML and other workload across devices
    • Apple's Core ML has MLComputeUnits which allows specifying any combination of CPU, GPU, or ANE. But notably its guidance is: "Use all to allow the OS to select the best processing unit to use (including the neural engine, if available). Use MLComputeUnits.cpuOnly to restrict the model to the CPU, if your app might run in the background or runs other GPU intensive tasks." That matches what we (Google Chrome) have heard from multiple teams: that one advantage of a TPU/NPU/ANE (etc) is that work can be offloaded from the GPU. So the use case is not "please use CPU and TPU" it's "please don't use GPU" instead!

IMHO, the last use case is the only one where something like the current MLDeviceType is plausible, but it's not a good fit as-is. I'm not sure how to express (4) well - pass the GPUDevice to avoid?!?!

I'd prefer to remove the deviceType option now (as was proposed in #322), and consider additional enums/values (per 3) as use cases are confirmed, and explore how to express (4) separately, to avoid baking in the current design.

@huningxin
Copy link
Contributor

@inexorabletash , thanks for revisiting this issue!

  1. Coordinating ML and other workload across device

+1 to explore this use case. I know there are interests from platform vendors in scheduling the workloads across multiple devices.

I'd prefer to remove the deviceType option now

Before we have alternative solution, I concern removing deviceType options would prevent us from testing and comparing CPU and GPU executions in current Chromium prototype. We may also want to prototype the NPU execution by extending deviceType with "npu" value in the near future.

In #322, we also discussed whether to introduce "system" device type, that would allow use case (1) and "all" devices case of (4) .

For other cases of (4), maybe we can introduce similar combinations of MLComputeUnits, something like "cpu_and_gpu", "cpu_and_npu"?

Another idea is to change deviceType to bit values, then developers can combine them, for example MLDeviceType.CPU | MLDeviceType.NPU.

@inexorabletash
Copy link
Member

Before we have alternative solution, I concern removing deviceType options would prevent us from testing and comparing CPU and GPU executions in current Chromium prototype.

Fair point. We may want to flag in the spec (with an Issue: ... banner pointing here) that this part of the API is in flux. While that's technically true for the spec generally, it could help reviewers focus their attention.

We may also want to prototype the NPU execution by extending deviceType with "npu" value in the near future.

Now we just have to pick a neutral term. 😉

@wchao1115
Copy link
Collaborator

wchao1115 commented Mar 3, 2024

@inexorabletash

  1. The "delegating it to the OS" approach sounds good until you realize that the OS can make mistakes you cannot correct so be careful what you wish for. More often than not, the OS doesn't have enough context to make a decision on behalf of the app, and if it has to make one, it will have no idea that the decision it makes now is consistent with what it did in the past. In fact, the whole notion of "hints" was invented as a way to soften the expectation from the app (in case a mistake occurs). Usually the most robust approach, and the one with the highest degree of predictability and forward compatibility to the user is one that puts the decision in the hands of the app (or sometimes the framework), with the OS helping to carry out that decision. In our case, it calls for an explicit device type on WebNN.

  2. Simplify MLContext creation #322 wants to streamline the GPU usage in WebNN by saying that only a WebGPU context owns the GPU device, to avoid a situation where both the WebGPU and WebNN each owns its own device (which could both point to the same underlying physical adapter, or to two different adapters in the same machine e.g. one integrated another discrete, etc). I still think it's the right approach to not allow WebNN to hide another GPU device under the hood without the app's or WebGPU's knowledge. It makes predictable cross-adapter resource sharing very hard to achieve.

  3. A device type npu looks to be a good abstraction until you realize that it is fundamentally different from a gpu or cpu device type in that while both the GPU and CPU can execute any arbitrary operator (current or future) due to its programmability property, the NPU can't as they are mostly non-programmable i.e. it supports only a certain set of operators and even with specific variants of such. There are essentially two ways to deal with this difference -- assume that any operator can fail with an unsupported failure, or pair an NPU up with a programmable fallback device (either a GPU or CPU). The former will likely increase the complexity for graph compilation and make it much harder for the app to successfully "use" the NPU, while the latter shifts that complexity downward to the implementation of WebNN, either in the backend or in the OS. I actually prefer the latter as it's likely to produce a better result with higher efficiency at a cost of better predictability.

@inexorabletash
Copy link
Member

Thanks for explaining further @wchao1115 - to ensure I'm understanding, let me attempt to restate the proposal: if you want WebNN to use GPU, you need to pass a GPUDevice, and otherwise WebNN should avoid using a system's GPU; the alternative is surprising behavior.

Re: (3) and "pairing up" an NPU with a programmable fallback device - again, thank you for walking through this. Our current understanding is that DML either uses GPU or NPU and doesn't contain a CPU fallback path. In contrast, TFLite and CoreML will fall back to CPU if a GPU or NPU delegate can't cover an op. How do envision a WebNN implementation with a DML backend supporting this fallback? For DML would the fallback become the UA's responsibility?

(Also, please don't read my comments/questions as arguing for an alternative approach - just trying to understand!)

@wchao1115
Copy link
Collaborator

wchao1115 commented Mar 6, 2024

Thank you @inexorabletash for your comment. Yes, re: #322. Setting the gpu enum value for the device type replicates the WebGPU workflow on device creation, its lifetime and resource domain e.g. how to deal with device removed, etc. Not starting another GPU life cycle from within a WebNN context seems like a right thing to do.

CPU fallback could be highly expensive when it involves breaking up a fully pipelined GPU resources and drawing down a sizeable amount of data from the GPU memory to the DDR system memory with lower memory bandwidth. This overhead could be worse than just running the whole graph on the CPU in many cases, especially those that pull the data across the memory bus. This cost is far less prominent in a system with unified memory architecture where all the memories are shared and equally accessible to all integrated processors.

DML today doesn't support CPU fallback on its own and still relying on the framework to do so, although it is technically possible through a WARP device. For WebNN, there is a dedicated CPU backend as we know, in addition to the DML backend. The pairing between an NPU and an integrated processor, whether it's an integrated GPU or a CPU should avoid the high cost of pipeline switching and memory readback while still providing a decent runtime experience to the users when fallback is needed.

@wchao1115
Copy link
Collaborator

wchao1115 commented Mar 7, 2024

BTW, @RafaelCintron just gave me some relevant bits of info that WebGPU currently still lacks support for discrete GPU. I think it brings up a good point about the inherent risk of taking a dependency on an external spec i.e. there is a chance that #322 could be a logically correct but physically wrong design when we consider that any limitation or constraint on the GPU device lifetime on the WebGPU side will also impose the same limitation on WebNN. That is an argument for keeping the current design of having the "gpu" device type and not taking up #322.

We could also add a new device type "npu" along with a new fallback device type with the supported values of { "gpu", "none" }, where "none" would mean "no fallback needed" i.e. compile just fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants