Skip to content

Commit 758e0c3

Browse files
authored
Update explainer with new proposal for simple accelerator mapping (#884)
* Add design rationale, background and examples for the `accelerated` hint introduced in #895 * Add a design proposal for the proposed future enhancement, a CPU fallback hint * Document new requirements for post-compile query Signed-off-by: Zoltan Kis <zoltan.kis@intel.com>
1 parent 0ce9f32 commit 758e0c3

File tree

1 file changed

+56
-8
lines changed

1 file changed

+56
-8
lines changed

device-selection-explainer.md

Lines changed: 56 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,9 @@ Possible means:
6363
- identify hints/constraints that require a feedback (error) if not supported, for instance "avoid CPU fallback" or "need low power and low latency acceleration".
6464

6565
### 3. Post-compile query of inference details
66-
**Requirement**: query a compiled graph for details on how may it be run (subject to being overridden by the platform).
66+
**Requirement**:
67+
- Query a compiled graph for details on how may it be run (subject to being overridden by the platform).
68+
- Query if CPU fallback is active for a context.
6769

6870
This is being discussed in [Get devices used for a graph after graph compilation #836](https://github.com/webmachinelearning/webnn/issues/836)
6971
and being explored in PR [#854 (define graph.devices)](https://github.com/webmachinelearning/webnn/pull/854).
@@ -73,19 +75,23 @@ Initially, the proposal was to obtain the list/combination of devices usable for
7375

7476
Design decisions may take the following into account:
7577

76-
1. Allow the underlying platform to hint to, or ultimately choose the preferred compute device(s).
78+
1. Allow the underlying platform to ultimately choose the appropriate compute device(s).
7779

7880
2. Allow scripts to express hints/options when creating contexts, such as preference for low power consumption, high performance (throughput), low latency, stable sustained performance, accuracy, etc.
7981

8082
3. Allow an easy way to create a context with a GPU device, i.e., without specifying an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) (e.g., via `powerPreference`).
8183

8284
4. Allow selection from available GPU devices, for instance, by allowing specification of an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) obtained from available [GPUAdapters](https://gpuweb.github.io/gpuweb/#gpuadapter) using [WebGPU](https://gpuweb.github.io/gpuweb) mechanisms via [GPURequestAdapterOptions](https://gpuweb.github.io/gpuweb/#dictdef-gpurequestadapteroptions), such as feature level or power preference.
8385

84-
5. Allow selection from available various AI accelerators, including NPUs or a combination of accelerators. This may happen using a (to-be-specified) algorithmic mapping from context options. Or, allow web apps to hint a preferred fallback order for the given context, for instance, `["npu", "cpu"]`, meaning that implementations should try executing the graph on an NPU as much as possible and try to avoid the GPU. The `"cpu"` option could even be omitted, as it could be the default fallback device; therefore, specifying `"npu"` alone would mean the same. However, this can become complex with all possible device variations, so we must specify and standardize the supported fallback orders. (Related to discussions in Issue #815).
86+
5. Allow selection from available various AI accelerators, including NPUs, GPUs or a combination of accelerators. This may happen using a (to-be-specified) algorithmic mapping from context options. Or, allow web apps to hint a preferred fallback order for the given context, or fallbacks to avoid (if that is supported). (Related to discussions in Issue #815).
8587

86-
6. Allow enumeration of [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary) before creating a context so that web apps can select the best device that would work with the intended model. This needs more developer input and examples. (Related to discussions in Issue #815).
88+
6. Add a context creation option/hint for telling app preference for being simply ["accelerated"](https://github.com/webmachinelearning/webnn/issues/815#issuecomment-2658627753), meaning NPU, GPU or both.
8789

88-
7. As a corollary to 6, allow creating a context using options for [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary). (Related to discussions in Issue #815).
90+
7. Allow enumeration of [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary) before creating a context so that web apps can select the best device that would work with the intended model. This needs more developer input and examples. (Related to discussions in Issue #815).
91+
92+
8. As a corollary to 6, allow creating a context using options for [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary). (Related to discussions in Issue #815).
93+
94+
9. Expose a context property (or event) to tell whether CPU fallback is active (or likely active) for the context.
8995

9096

9197
## Scenarios, examples, design discussion
@@ -102,6 +108,22 @@ context = await navigator.ml.createContext({powerPreference: 'low-power'});
102108
// create a context that will likely map to GPU
103109
context = await navigator.ml.createContext({powerPreference: 'high-performance'});
104110

111+
// create a context that should use massive parallel processing (e.g. GPU/NPU)
112+
context = await navigator.ml.createContext({accelerated: true});
113+
if (context.accelerated) {
114+
// the context will mostly use GPU/NPU, but CPU fallback may happen
115+
} else {
116+
// the platform tells it likely cannot provide NPU or GPU, so try something else
117+
}
118+
119+
// create a context that should preferably use NPU
120+
context = await navigator.ml.createContext({accelerated: true, powerPreference: 'low-power'});
121+
if (context.accelerated) {
122+
// NPU is likely used -- further requirements could be set by opSupportLimitsPerDevice
123+
} else {
124+
// NPU is likely not available, and since GPU needs high power, it is not used
125+
}
126+
105127
// enumerate devices and limits (as allowed by policy/implementation)
106128
// and select one of them to create a context
107129
const limitsMap = await navigator.ml.opSupportLimitsPerDevice();
@@ -122,7 +144,7 @@ const context = await navigator.ml.createContext({ fallback: ['npu', 'cpu'] });
122144

123145
## Open questions
124146

125-
- WebGPU provides a way to select a GPU device via [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should WebNN expose a similar adapter API for NPUs?
147+
- WebGPU provides a way to select a GPU device via [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should WebNN expose a similar adapter API for NPUs? The current take is to not expose explicit adapters.
126148

127149
- How should WebNN extend the context options? What exactly is best to pass as context options? Operator support limits? Supported features, similar to [GPUSupportedFeatures](https://gpuweb.github.io/gpuweb/#gpusupportedfeatures)? Others?
128150

@@ -164,7 +186,7 @@ A WebNN application may have specific device preferences for model execution. Th
164186
* *Description*: The application developer hints that the model execution should contribute as little as possible to the overall system power draw. This is a broader consideration than just the model's own efficiency, potentially influencing scheduling and resource allocation across the system. The implementation may choose any device ("where JS and Wasm execute," "where WebGL and WebGPU programs execute," or "other") that best achieves this goal.
165187

166188

167-
## Minimum Viable Solution
189+
## Minimum Viable Solution (MVS, completed)
168190

169191
Based on the discussion above, the best starting point was a simple solution that can be extended and refined later. A first contribution could include the following changes:
170192
- Remove `MLDeviceType` (see [CRD 20250131](https://www.w3.org/TR/2025/CRD-webnn-20250131/#enumdef-mldevicetype)) as an explicit [context option](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions).
@@ -179,7 +201,7 @@ Besides, the following topics have been discussed:
179201
- Document the valid use cases for requesting a certain device type or combination of devices, and under what error conditions. Currently, after these changes, there remains explicit support for a GPU-only context when an `MLContext` is created from a `GPUDevice` in `createContext()`.
180202
- Discuss option #3 from [Considered alternatives](#considered-alternatives).
181203

182-
## Next Phase Device Selection Solution
204+
## Next discussion phase after MVS
183205

184206
In [Remove MLDeviceType #809](https://github.com/webmachinelearning/webnn/pull/809), this [comment](https://github.com/webmachinelearning/webnn/pull/809#discussion_r1936856070) raised a new use case:
185207

@@ -210,6 +232,32 @@ Given the discussion in Issue #815 ([comment](https://github.com/webmachinelearn
210232
- If yes, then in some cases (e.g., CoreML), the model needs to be dispatched before knowing for sure whether it can be executed on the GPU. For that, a new API is needed, as discussed in [Get devices used for a graph after graph compilation #836](https://github.com/webmachinelearning/webnn/issues/836) and being explored in PR [#854 (define graph.devices)](https://github.com/webmachinelearning/webnn/pull/854).
211233
Based on the answer, the developer may choose an option other than WebNN. Besides that, the feature permits gathering data on typical graph allocations (note: fingerprintable), which might help the specification work on the device selection API.
212234

235+
## Simple accelerator mapping solution
236+
237+
The following [proposal](https://github.com/webmachinelearning/webnn/issues/815#issuecomment-3198261369) gained support for a simple accelerator mapping solution (before using the previously discussed fine grained constraints):
238+
- Expose a context property (or event) to tell whether CPU fallback is active (or likely active).
239+
- Add a context creation option/hint (e.g. `accelerated: true`) for telling app preference for NPU and/or GPU accelerated ["massively parallel"](https://en.wikipedia.org/wiki/Massively_parallel) processing (MPP).
240+
Note that in [certain use cases](https://www.w3.org/2025/09/25-webmachinelearning-minutes.html) applications might prefer CPU inference, therefore specifying `accelerated: false` has legit use cases as well.
241+
- Add a context property named `"accelerated"` with possible values: `false` (for likely no support for neither GPU nor NPU), and `true` (e.g. fully controlled by the underlying platform which makes a best effort for MPP, yet CPU fallback may occur).
242+
243+
The following Web IDL changes are proposed:
244+
245+
```js
246+
partial dictionary MLContextOptions {
247+
boolean accelerated = true;
248+
};
249+
250+
partial interface MLContext {
251+
readonly attribute boolean accelerated;
252+
};
253+
```
254+
255+
The behavior of [createContext()](https://webmachinelearning.github.io/webnn/#dom-ml-createcontext) is proposed to follow this policy:
256+
- Set the `accelerated` property to `false` when the platform could in principle provide massive parallel processing which may or may not be available at the moment. Applications may poll this property.
257+
258+
In the future, more policy options could be considered, for instance:
259+
- Return an error [in step 4](https://webmachinelearning.github.io/webnn/#create-a-context) if the context option `accelerated` has been set to `true`, but the platform cannot provide massive parallel processing at all.
260+
213261
## History
214262

215263
Previous discussion covered the following main topics:

0 commit comments

Comments
 (0)