You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update explainer with new proposal for simple accelerator mapping (#884)
* Add design rationale, background and examples for the `accelerated` hint introduced in #895
* Add a design proposal for the proposed future enhancement, a CPU fallback hint
* Document new requirements for post-compile query
Signed-off-by: Zoltan Kis <zoltan.kis@intel.com>
Copy file name to clipboardExpand all lines: device-selection-explainer.md
+56-8Lines changed: 56 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,7 +63,9 @@ Possible means:
63
63
- identify hints/constraints that require a feedback (error) if not supported, for instance "avoid CPU fallback" or "need low power and low latency acceleration".
64
64
65
65
### 3. Post-compile query of inference details
66
-
**Requirement**: query a compiled graph for details on how may it be run (subject to being overridden by the platform).
66
+
**Requirement**:
67
+
- Query a compiled graph for details on how may it be run (subject to being overridden by the platform).
68
+
- Query if CPU fallback is active for a context.
67
69
68
70
This is being discussed in [Get devices used for a graph after graph compilation #836](https://github.com/webmachinelearning/webnn/issues/836)
69
71
and being explored in PR [#854 (define graph.devices)](https://github.com/webmachinelearning/webnn/pull/854).
@@ -73,19 +75,23 @@ Initially, the proposal was to obtain the list/combination of devices usable for
73
75
74
76
Design decisions may take the following into account:
75
77
76
-
1. Allow the underlying platform to hint to, or ultimately choose the preferred compute device(s).
78
+
1. Allow the underlying platform to ultimately choose the appropriate compute device(s).
77
79
78
80
2. Allow scripts to express hints/options when creating contexts, such as preference for low power consumption, high performance (throughput), low latency, stable sustained performance, accuracy, etc.
79
81
80
82
3. Allow an easy way to create a context with a GPU device, i.e., without specifying an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) (e.g., via `powerPreference`).
81
83
82
84
4. Allow selection from available GPU devices, for instance, by allowing specification of an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) obtained from available [GPUAdapters](https://gpuweb.github.io/gpuweb/#gpuadapter) using [WebGPU](https://gpuweb.github.io/gpuweb) mechanisms via [GPURequestAdapterOptions](https://gpuweb.github.io/gpuweb/#dictdef-gpurequestadapteroptions), such as feature level or power preference.
83
85
84
-
5. Allow selection from available various AI accelerators, including NPUsor a combination of accelerators. This may happen using a (to-be-specified) algorithmic mapping from context options. Or, allow web apps to hint a preferred fallback order for the given context, for instance, `["npu", "cpu"]`, meaning that implementations should try executing the graph on an NPU as much as possible and try to avoid the GPU. The `"cpu"` option could even be omitted, as it could be the default fallback device; therefore, specifying `"npu"` alone would mean the same. However, this can become complex with all possible device variations, so we must specify and standardize the supported fallback orders. (Related to discussions in Issue #815).
86
+
5. Allow selection from available various AI accelerators, including NPUs, GPUs or a combination of accelerators. This may happen using a (to-be-specified) algorithmic mapping from context options. Or, allow web apps to hint a preferred fallback order for the given context, or fallbacks to avoid (if that is supported). (Related to discussions in Issue #815).
85
87
86
-
6.Allow enumeration of [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary) before creating a context so that web apps can select the best device that would work with the intended model. This needs more developer input and examples. (Related to discussions in Issue #815).
88
+
6.Add a context creation option/hint for telling app preference for being simply ["accelerated"](https://github.com/webmachinelearning/webnn/issues/815#issuecomment-2658627753), meaning NPU, GPU or both.
87
89
88
-
7. As a corollary to 6, allow creating a context using options for [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary). (Related to discussions in Issue #815).
90
+
7. Allow enumeration of [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary) before creating a context so that web apps can select the best device that would work with the intended model. This needs more developer input and examples. (Related to discussions in Issue #815).
91
+
92
+
8. As a corollary to 6, allow creating a context using options for [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary). (Related to discussions in Issue #815).
93
+
94
+
9. Expose a context property (or event) to tell whether CPU fallback is active (or likely active) for the context.
- WebGPU provides a way to select a GPU device via [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should WebNN expose a similar adapter API for NPUs?
147
+
- WebGPU provides a way to select a GPU device via [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should WebNN expose a similar adapter API for NPUs? The current take is to not expose explicit adapters.
126
148
127
149
- How should WebNN extend the context options? What exactly is best to pass as context options? Operator support limits? Supported features, similar to [GPUSupportedFeatures](https://gpuweb.github.io/gpuweb/#gpusupportedfeatures)? Others?
128
150
@@ -164,7 +186,7 @@ A WebNN application may have specific device preferences for model execution. Th
164
186
**Description*: The application developer hints that the model execution should contribute as little as possible to the overall system power draw. This is a broader consideration than just the model's own efficiency, potentially influencing scheduling and resource allocation across the system. The implementation may choose any device ("where JS and Wasm execute," "where WebGL and WebGPU programs execute," or "other") that best achieves this goal.
165
187
166
188
167
-
## Minimum Viable Solution
189
+
## Minimum Viable Solution (MVS, completed)
168
190
169
191
Based on the discussion above, the best starting point was a simple solution that can be extended and refined later. A first contribution could include the following changes:
170
192
- Remove `MLDeviceType` (see [CRD 20250131](https://www.w3.org/TR/2025/CRD-webnn-20250131/#enumdef-mldevicetype)) as an explicit [context option](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions).
@@ -179,7 +201,7 @@ Besides, the following topics have been discussed:
179
201
- Document the valid use cases for requesting a certain device type or combination of devices, and under what error conditions. Currently, after these changes, there remains explicit support for a GPU-only context when an `MLContext` is created from a `GPUDevice` in `createContext()`.
180
202
- Discuss option #3 from [Considered alternatives](#considered-alternatives).
181
203
182
-
## Next Phase Device Selection Solution
204
+
## Next discussion phase after MVS
183
205
184
206
In [Remove MLDeviceType #809](https://github.com/webmachinelearning/webnn/pull/809), this [comment](https://github.com/webmachinelearning/webnn/pull/809#discussion_r1936856070) raised a new use case:
185
207
@@ -210,6 +232,32 @@ Given the discussion in Issue #815 ([comment](https://github.com/webmachinelearn
210
232
- If yes, then in some cases (e.g., CoreML), the model needs to be dispatched before knowing for sure whether it can be executed on the GPU. For that, a new API is needed, as discussed in [Get devices used for a graph after graph compilation #836](https://github.com/webmachinelearning/webnn/issues/836) and being explored in PR [#854 (define graph.devices)](https://github.com/webmachinelearning/webnn/pull/854).
211
233
Based on the answer, the developer may choose an option other than WebNN. Besides that, the feature permits gathering data on typical graph allocations (note: fingerprintable), which might help the specification work on the device selection API.
212
234
235
+
## Simple accelerator mapping solution
236
+
237
+
The following [proposal](https://github.com/webmachinelearning/webnn/issues/815#issuecomment-3198261369) gained support for a simple accelerator mapping solution (before using the previously discussed fine grained constraints):
238
+
- Expose a context property (or event) to tell whether CPU fallback is active (or likely active).
239
+
- Add a context creation option/hint (e.g. `accelerated: true`) for telling app preference for NPU and/or GPU accelerated ["massively parallel"](https://en.wikipedia.org/wiki/Massively_parallel) processing (MPP).
240
+
Note that in [certain use cases](https://www.w3.org/2025/09/25-webmachinelearning-minutes.html) applications might prefer CPU inference, therefore specifying `accelerated: false` has legit use cases as well.
241
+
- Add a context property named `"accelerated"` with possible values: `false` (for likely no support for neither GPU nor NPU), and `true` (e.g. fully controlled by the underlying platform which makes a best effort for MPP, yet CPU fallback may occur).
242
+
243
+
The following Web IDL changes are proposed:
244
+
245
+
```js
246
+
partial dictionary MLContextOptions {
247
+
boolean accelerated =true;
248
+
};
249
+
250
+
partial interface MLContext {
251
+
readonly attribute boolean accelerated;
252
+
};
253
+
```
254
+
255
+
The behavior of [createContext()](https://webmachinelearning.github.io/webnn/#dom-ml-createcontext) is proposed to follow this policy:
256
+
- Set the `accelerated` property to `false` when the platform could in principle provide massive parallel processing which may or may not be available at the moment. Applications may poll this property.
257
+
258
+
In the future, more policy options could be considered, for instance:
259
+
- Return an error [in step 4](https://webmachinelearning.github.io/webnn/#create-a-context) if the context option `accelerated` has been set to `true`, but the platform cannot provide massive parallel processing at all.
260
+
213
261
## History
214
262
215
263
Previous discussion covered the following main topics:
0 commit comments