Define the maximum number of operand dimensions (maximum rank) #456

huningxin · 2023-08-24T01:50:24Z

Regarding to the current definition of MLOperandDescriptor

dictionary MLOperandDescriptor {
  // The operand type.
  required MLOperandType type;

  // The dimensions field is only required for tensor operands.
  sequence<unsigned long> dimensions;
};

there is no definition of the maximum number of the dimensions sequence in spec.

However, for implementation, the native ML APIs usually have the maximum supported size. For example:

XNNPACK: #define XNN_MAX_TENSOR_DIMS 6
DirectML:

Constant	Value	Description
DML_TENSOR_DIMENSION_COUNT_MAX	5	DirectML tensors support a maximum of 5 dimensions for DML_TARGET_VERSION < DML_FEATURE_LEVEL_3_0.
DML_TENSOR_DIMENSION_COUNT_MAX1	8	DirectML tensors support a maximum of 8 dimensions for DML_TARGET_VERSION >= DML_FEATURE_LEVEL_3_0.

And there may be per operator definitions, such as for convolution operator, the maximum dimensions count is 5, for element-wise add operator, the maximum dimensions count is 8. Thanks @fdwr sharing this information!

In a Chromium CL review, @RafaelCintron (Thanks!) mentions

we should have a better solution for web developers to know ahead of time which operator parameters are expected to fail and which will not so they can know which models to use. Requiring JS error string parsing for each browser vendor is not a great solution.

Rafael also shared that WebGPU solved a similar problem with "limits" https://gpuweb.github.io/gpuweb/#limits.

The text was updated successfully, but these errors were encountered:

fdwr · 2023-08-24T04:53:40Z

we should have a better solution for web developers to know ahead of time which operator parameters are expected to fail

Yeah, we really could use better diagnosability of graph failure creation related to backend limitations, either:

(a) Very early during client-side graph construction, such as a graphBuilder.add(x, y) call. This would be the most immediately diagnosable, but it would require priori knowledge of what could possibly fail later, which could be challenging considering the matrix of the selected backend, OS version (which affects which operators exist), and GPU/NPU data type support.
(b) Early still, but during graphBuilder.build(). This too would need priori handshaking with the backend. If via exception, then the exception should have enough info to diagnose which node failed (operator type at least) and the cause (data type support missing, dimension count, other invalid parameter) so the caller stands of chance of reliably recovering and retrying with different results.
(c) Later during mlContext.compute(). This wouldn't need priori knowledge, but like b would also need enough node info to be diagnosable to the caller. Even if we go with (a)/(b) and attempt to forecast possible failures, I could see there being some unexpected failures that we still don't know until (c), such as a particular execution taking too long on the GPU and timing out.

wacky6 · 2023-08-24T13:17:39Z

Do we know what's the max dims required the models we definately want to support (e.g. media pipe, mobilenet, diffusion/llm)? Perhaps that's a good middle ground for now.

I guess most models are fine with max dims < 5?

Exposing limits early to clients is okay, we can expose the limit on MLContext (before graph building). Based on the current spec, MLContext is required to use MLGraphBuilder. I think it's a reasonable place.

Taking a step back, I'd prefer we limit max_dims to the "greatest common divisor" across the backends we want to support (referring to Google's feedback on the API: #453 : XNNPACK, DML<3, Apple M1/M2, upcoming Intel VPU).

Progressively adding features (i.e. higher dims) is much easier than asking API users feature detect / handle failures from the beginning.

If we don't want to limit max_dims the spec, can we at least provide a guideline (e.g. for best interoperability, don't use dimensions larger than X) based on our survey?

Me as a naive developer: knowing the model can run on the backend before downloading a multi-GB weight file is useful (don't waste bandwidth on things that can't be used).

fdwr · 2023-08-24T23:18:17Z

However, for implementation, the native ML APIs usually have the maximum supported size.

📚 @huningxin: Adding a few more references:

Dimensions	API	Notes
6D	XNNPACK XNN_MAX_TENSOR_DIMS	*Marat Dukhan at Google says below 8D will be supported :)
8D	DirectML DML_TENSOR_DIMENSION_COUNT_MAX1	DML_TARGET_VERSION >= DML_FEATURE_LEVEL_3_0
8D	Nvidia cuDNN: CUDNN_DIM_MAX	"a number of dimensions from 3 to 8"
8D	Apple BNNS: BNNS.DataLayout.tensor8DFirstMajor
12D	Intel OneDNN enum_dnnl_format_tag_t	"These names use letters from a to l to denote logical dimension from 1 to 12"
16D	Apple Metal Performance Shaders MPSNDArraySizes, MPSNDArrayDescriptor
32D	numpy `NPY_MAXDIMS`	"maximum supported dimension for an ndarray is 32"

5 for DML_TARGET_VERSION < DML_FEATURE_LEVEL_3_0

I would ignore older DML versions before 3.0, because WebNN needs the DML_GRAPH anyway.

Do we know what's the max dims required ... I guess most models are fine with max dims < 5?

🌍 @wacky6: The largest “real world” models we’ve seen have 7 dimensions, a few have 6, many have 5, and of course the rest have 4 (I have no idea what model would use 12 dimensions or 32 though, which seems excessive given the tensor dimensions would yield a rapidly exponential element count). Given these models and considering GPU architecture (where a natural vector size for many GPU's is 4 x 32 bits), a reasonable upper limit would be 8D which fits nicely as two uint32_t4’s, which coincidentally DirectML and cuDNN and BNNS settled on.

limit max_dims to the "greatest common divisor" across the backends

🤔 Note the limitations of a backend need not completely constrain the frontend though. There will be differing backend limitations, but it turns out that because WebNN does not support arbitrary tensor strides anyway, and all elements are contiguous in memory, one can fold higher dimensional input into lower dimensional input. For example, any elementwise ND operation (add, relu, elementwiseIf...) is treatable as a large 1D array, and similar folding logic is applicable to nearly every other class of operator.

Pure elementwise operators can be flattened to a simple 1D array:

[[A,B,C],   ->  [A,B,C,D,E,F,G,H,I,J,K,L]
 [D,E,F],
 [G,H,I],
 [J,K,L]]

Operators taking a single axis can be collapsed into [collapsed left side, middle axis, collapsed right side]:

axis=2, sizes=[2,3,(4),5,6]  ->  axis=1, sizes=[6,4,30]

Operators with split points can flatten all dimensions into two partitions before and after the axis split point:

axis=3, sizes=[2,3,4,|5]  ->  axis=1, sizes=[24,|5]
axis=2, sizes=[2,3,|4,5]  ->  axis=1, sizes=[6,|20]
axis=0, sizes=[|2,3,4,5]  ->  axis=0, sizes=[|120]

Operators with multiple axes can coalesce adjacent axes:

axes=[1,2], sizes=[2,3,4,5]  ->  axes=[1], sizes=[2,12,5]

Then there are operators where some dimensions are fixed, but all the rest can be flattened (e.g. BatchNorm locks the first two axes but flattens all other dimensions to the right, whereas GEMM and Softmax lock the right two but flatten all the leading batch dimensions to the left).

All of these are just reshapes and operator description adjustments, no tensor memory copy needed.

Now some operators are not achievable via a single simple reshape from say 7D to 4D, like a 7D Tile or Resample with non-collapsible repetition values, but they can still be collapsed and implemented in terms of lower dimensional implementations with just two 4D calls. Then pretty much everything else (except potentially ND gather and scatter 🤨) can be implemented in terms of transpose plus more than one call of that operator.

Some background experience... Originally because the earliest versions of DirectML were limited to only 4 dimensions (and 5 in the rare case of 3D convolution), we needed to implement this dimension collapsing logic in the TensorFlow fork atop DirectML. Later this kind of logic was moved directly into DirectML so any API caller could get up to 8D. Interestingly XNNPack at 6D is only 2 dimensions away from most of the PACK (BNNS, DML, cuDNN 😉), but then XNNPack technically already supports any elementwise operators with contiguous memory tensors of 32 dimensions, if one just reinterprets it as a 1D array before calling XNNPack 😎.

Exposing limits early to clients is okay

@wacky6: So the caller can avoid wasted time and energy (or like you mention, download cost) building a graph that is doomed to fail later anyway, I would expose dimensionality limits earlier where possible per MLContext (similar to WebGPU's limits concept). Now, there are still cases where individual operators may not support up to that general maximum (for example, convolution is typically 4D or 5D, even if the maximum operator limit is 8D in general), meaning failure could still occur later. So, reporting limits early doesn't completely obviate the use for good error reporting after graph construction/execution. Additionally (although not the topic of this post), other matters like absent data type support could cause a failure. So, I suppose my (a)/(b)/(c) above are not strictly orthogonal.

Maratyszcza · 2023-08-25T18:23:39Z

FYI, we plan to increase XNN_MAX_TENSOR_DIMS in XNNPack to 8

…learning#456

* Bug fix: Fix MLGraphBuilder.input()'s handling of scalars. Fixes #502 MLOperandDescriptor was updated in c320472 to always have dimensions, defaulted to an empty list for scalars. That makes the current prose for input() incorrect. Issue #502 already tracked correcting it, so let's simplify - just change the logic for "is a scalar?" and drop the bogus assert. * Remove conditional and fix check dimensions to allow 0 * Factor byte length check into checking dimensions * Oops, this should go too. * type -> dataType * (1) simplify (2) move into MLOperandDescriptor (3) link to #456 * Restore dimension check and reorder byte length to last * Fix build for missing dimensions --------- Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

fdwr · 2024-03-14T03:10:04Z

FYI, we plan to increase XNN_MAX_TENSOR_DIMS in XNNPack to 8

@Maratyszcza : Is this still in mind? Spelunking all the usage sites (https://github.com/search?type=code&q=XNN_MAX_TENSOR_DIMS+repo%3Agoogle%2FXNNPack), it appears increasing 6->8 would have little collateral impact to the code, mainly increasing the size of various local variables (e.g. std::array<size_t, XNN_MAX_TENSOR_DIMS> input1_strides;).

Maratyszcza · 2024-03-14T20:53:54Z

I no longer work on XNNPack, inviting @alankelly & @fbarchard to answer

alankelly · 2024-03-27T07:54:44Z

We are planning on adding support for more dimensions this year. We are working on various runtime changes now as part of which this may be integrated into.

inexorabletash · 2024-07-01T23:52:33Z

@philloooo and @a-sully and I had an internal discussion about this. Over in issue #463 @philloooo proposes that each op could express rank limits in the context's opSupportLimits() output, e.g. {rankRange: [0,4]} and if we do that, documenting or exposing a global maximum rank may not be useful for frameworks. Frameworks would check the rank for ops and either adapt or fail early. Inputs, constants, and outputs would have special entries and could similarly provide the maximum rank.

Feedback from frameworks (like ONNX RTWeb) would be helpful here!

fdwr · 2024-07-02T06:35:00Z

proposes that each op could express rank limits in the context's opSupportLimits() output, e.g. {rankRange: [0,4]}

@inexorabletash I like this granularity because callers can make informed decisions to branch (fall back) to another executor per operator. @Honry for WebNN EP thoughts too.

Btw @philloooo, I'm interested in the rank limits for CoreML if you have learned more, as I'm missing them in the table above. For DML, I can provide them all in a single JSON file that should be easy to auto-generate/translate (easier than scraping them from MSLearn webpages). For TF, I don't know where to begin (but I bet Austin/Phillis would).

exposing a global maximum rank may not be useful for frameworks

It's true that if you have the per-operator rank, then knowing a global rank is technically redundant, but beware that still requires downloading some version of the model before that question can even be asked for each operator, whereas if the implementation reported a maximum global rank, the caller might choose to download an entirely different model to begin with. For example, if the backend is CoreML, and if it doesn't support 6D reshaping, then the caller could download a different version of ESRGAN up front. Now, if model topology was stored separately from the weights (thus a lighter download), it would be less costly, but weights and topology are typically stored in the same file.

documenting ... a global maximum rank

I still think it's useful to recommend a general maximum rank in the spec, even if the WebNN API does not report one to callers, just so backend implementers can know what is likely to come their way (and to try to fold higher dimensions into whatever their backend supports), and also to know what's unlikely to come their way (and not worry about those cases).

Note too, there are ways to fold multiple dimensions to lower spaces so more limited backends can handle ND (such as very old versions of DML that only handled 4D). For example, all elementwise operators can just be treated as 1D no matter how many actual dimensions there are. A single 6D transpose can be achieved via two 4D transposes. The logic varies per operator category, but I'm happy to share the logic I've used in the past. That means if CoreML's add itself was limited to 4D (I don't know if it actually is or not), the WebNN backend might be able to report an add rank of [0,8].

Honry · 2024-07-02T07:55:27Z

proposes that each op could express rank limits in the context's opSupportLimits() output, e.g. {rankRange: [0,4]}

@inexorabletash I like this granularity because callers can make informed decisions to branch (fall back) to another executor per operator. @Honry for WebNN EP thoughts too.

This should be helpful if the rank limits are inconsistent among different backends.

a-sully · 2024-07-02T16:15:03Z

Core ML has a global maximum rank of 5 (citations: (1) personal experience and (2) apple/coremltools#1723 (comment))

Some operators have more rank constraints, of course. But if we were to add a rankRange to all operators, than many - such as all the element-wise op - would just mirror this global.

That's probably fine? Is it safe for a framework like ORT Web to assume that the rank constraints of e.g. add are representative of the global rank constraints?

Honry · 2024-07-03T02:31:24Z

That's probably fine? Is it safe for a framework like ORT Web to assume that the rank constraints of e.g. add are representative of the global rank constraints?

That may be acceptable from ORT Web perspective. But then we have to maintain a list of ops category in WebNN EP, e.g. element-wise ops list.

a-sully · 2024-07-03T17:31:20Z

Hmm why would you need to maintain a list of op categories if every WebNN op has a rankRange?

If ORT Web wants to ask "what is the maximum rank this backend supports" then you could either:

query an op you think is representative of the global maximum, such as add, or
iterate over all the ops to find the max. There are less than 100 ops so this should be trivial

Is there something I'm missing?

Honry · 2024-07-04T00:13:41Z

Hmm why would you need to maintain a list of op categories if every WebNN op has a rankRange?

Aha, sorry I misunderstood your last comment, I thought you meant Chromium would only provide global rank constraints.

query an op you think is representative of the global maximum, such as add, or

iterate over all the ops to find the max. There are less than 100 ops so this should be trivial

Is there something I'm missing?

It's all clear now, thanks!

a-sully · 2024-07-07T19:51:46Z

Thanks for the clarification! In that case, can we close this issue by folding it into #463?

huningxin · 2024-07-08T02:12:57Z

can we close this issue by folding it into #463?

done!

We likely eventually want this to be part of MLOpSupportsLimits, but for now this allows us to replace some checked-casts in favor of static-casts, and not need to worry about tensors with absurd ranks See webmachinelearning/webnn#456 Bug: 329482489 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel Change-Id: I021d3b30ea1b8f5f3bef1725130fd2e4c569f494

We likely eventually want this to be part of MLOpSupportsLimits, but for now this allows us to replace some checked-casts in favor of static-casts, and not need to worry about tensors with absurd ranks See webmachinelearning/webnn#456 Bug: 329482489 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel Change-Id: I021d3b30ea1b8f5f3bef1725130fd2e4c569f494 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5639505 Reviewed-by: ningxin hu <ningxin.hu@intel.com> Reviewed-by: Phillis Tang <phillis@chromium.org> Reviewed-by: Koji Ishii <kojii@chromium.org> Commit-Queue: Austin Sullivan <asully@chromium.org> Cr-Commit-Position: refs/heads/main@{#1344800}

…stonly Automatic update from web-platform-tests webnn: Set a max operand rank of 8 We likely eventually want this to be part of MLOpSupportsLimits, but for now this allows us to replace some checked-casts in favor of static-casts, and not need to worry about tensors with absurd ranks See webmachinelearning/webnn#456 Bug: 329482489 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel Change-Id: I021d3b30ea1b8f5f3bef1725130fd2e4c569f494 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5639505 Reviewed-by: ningxin hu <ningxin.hu@intel.com> Reviewed-by: Phillis Tang <phillis@chromium.org> Reviewed-by: Koji Ishii <kojii@chromium.org> Commit-Queue: Austin Sullivan <asully@chromium.org> Cr-Commit-Position: refs/heads/main@{#1344800} -- wpt-commits: 3897cad3ad9f133db0c92269004109b95bec59ee wpt-pr: 47711

huningxin · 2024-10-17T14:45:58Z

Reactivate this issue focusing on rank range support.

inexorabletash mentioned this issue Feb 20, 2024

Bug fix: Fix MLGraphBuilder.input()'s handling of scalars #575

Merged

inexorabletash added a commit to inexorabletash/webnn that referenced this issue Feb 21, 2024

(1) simplify (2) move into MLOperandDescriptor (3) link to webmachine…

d0ffe1d

…learning#456

inexorabletash mentioned this issue May 2, 2024

Meta: Introduce "Interop" label? #673

Closed

anssiko mentioned this issue May 8, 2024

Google Chrome Feedback on WebNN: aiming for broad device coverage and maintainability #453

Closed

inexorabletash added the interop label May 9, 2024

huningxin mentioned this issue Jul 8, 2024

Allow checking whether operators/types are supported for a backend before creating a graph #463

Closed

huningxin closed this as completed Jul 8, 2024

a-sully mentioned this issue Jul 22, 2024

Add outputDataType to argmin/argmax #730

Merged

chromium-wpt-export-bot mentioned this issue Aug 21, 2024

webnn: Set a max operand rank of 8 web-platform-tests/wpt#47711

Merged

huningxin reopened this Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define the maximum number of operand dimensions (maximum rank) #456

Define the maximum number of operand dimensions (maximum rank) #456

huningxin commented Aug 24, 2023

fdwr commented Aug 24, 2023 •

edited

Loading

wacky6 commented Aug 24, 2023

fdwr commented Aug 24, 2023 •

edited

Loading

Maratyszcza commented Aug 25, 2023

fdwr commented Mar 14, 2024

Maratyszcza commented Mar 14, 2024

alankelly commented Mar 27, 2024

inexorabletash commented Jul 1, 2024

fdwr commented Jul 2, 2024 •

edited

Loading

Honry commented Jul 2, 2024

a-sully commented Jul 2, 2024

Honry commented Jul 3, 2024

a-sully commented Jul 3, 2024

Honry commented Jul 4, 2024

a-sully commented Jul 7, 2024

huningxin commented Jul 8, 2024

huningxin commented Oct 17, 2024

Define the maximum number of operand dimensions (maximum rank) #456

Define the maximum number of operand dimensions (maximum rank) #456

Comments

huningxin commented Aug 24, 2023

fdwr commented Aug 24, 2023 • edited Loading

wacky6 commented Aug 24, 2023

fdwr commented Aug 24, 2023 • edited Loading

Maratyszcza commented Aug 25, 2023

fdwr commented Mar 14, 2024

Maratyszcza commented Mar 14, 2024

alankelly commented Mar 27, 2024

inexorabletash commented Jul 1, 2024

fdwr commented Jul 2, 2024 • edited Loading

Honry commented Jul 2, 2024

a-sully commented Jul 2, 2024

Honry commented Jul 3, 2024

a-sully commented Jul 3, 2024

Honry commented Jul 4, 2024

a-sully commented Jul 7, 2024

huningxin commented Jul 8, 2024

huningxin commented Oct 17, 2024

fdwr commented Aug 24, 2023 •

edited

Loading

fdwr commented Aug 24, 2023 •

edited

Loading

fdwr commented Jul 2, 2024 •

edited

Loading