-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we need an MLConstantOperand
?
#668
Comments
FYI, these operands (constants only) are used by NSNet2 (noise supression) example: https://github.com/webmachinelearning/webnn-samples/blob/master/nsnet2/nsnet2.js#L51 |
My apologies for the error. I've updated the respective rows (they're also all "Constant only") |
Turns out XNNPACK also requires some tensors to be constants - for example |
It would be nice to make progress here, as @a-sully notes "it's much easier to relax these restrictions than to try to impose them later" I'm a fan of the I see a 👍 from @huningxin above; @fdwr have you had a chance to think about this? |
Sketched out a PR for |
Indeed, and I like the concept of an operand having a "constness" property, whether it be via a separate class
Or is the reverse true, that mandatory constness is a limitation in a framework, and there are useful reasons to pass dynamic tensors such as for custom weights or as outputs from other operators? DirectML has a means (via DML_GRAPH_NODE_TYPE_CONSTANT) to specify that a tensor is a constant, which can indeed afford optimization opportunities, but it's not required. Accepting dynamic inputs can be useful for cases like convTranspose weights being inflated from a dequantized tensor at runtime (saving significant model space). It sounds like I should scan through all the models on my hard drive to find any cases when dynamic tensors are fed into an operator which CoreML requires to be constant. ⏳ Skimming through the MIL ops, some decisions are unclear to me. Note we're not necessarily blocked for all the ops above when inputs have dynamic values, as multiple operators like normalization.batch_norm can be decomposed into operators that do not require constness, like {add, mul, div, sqrt}. When those input operands are constant, then the built-in CoreML |
I was initially skeptical of this approach, as it would seem that developers would be likely to develop and test on one platform (e.g. Chromium/Windows/DML) and only discover incompatibilities when users with different systems ran into failures. But if the vast majority of models that we've seen use constant weights anyway then this will be rare in practice. How smoothly would probing opSupportLimits here and/or responding to a sync build failure integrate into ORT? Is this an expected limitation of EPs already, or would we just rely on errors to propagate up from build? |
This seems useful, are all of the use cases that need none constant weights derive these weights from actual If all dynamic weights are derived from constants, we can solve this by doing a pre-processing subgraph to generate these constants. The pre-processing step can be within webnn logic, in such case, we would still need to expose the requirement that these weights need to be constant or are derived from constants. WebNN will detect when a weight is not constant and run a preprocessing subgraph to generate the constants. The pre-processing step can also be done outside of webnn, in such case we can expose the constness requirement through
Cons:
|
Breaking this into a couple sub problems. Specify constant operand by MLConstantOperand or isConstant propertyThis one we seem to have consensus that it's useful to express such concept. What is a constant operandI'd like to propose a constant operand could be either the output of This allows us to apply constant folding for use cases like inflating from dequantized weights, or doing any kind of reshape/split/transposes to the weights when there are format incompatibility. Define MLConstantOperand or MLOperand::isConstant()I think the MLConstantOperand is more explicit for the spec. If we want to declare some weights need to be constant, using Should we enforce weights to be MLConstantOperand.If we define MLConstantOperand to be either straight from |
I noticed that iOS17 conv_transpose allows dynamic filter
gru and lstm can also be emulated if non-constant weights provided. If we assume conv/convTranspose weights can be dynamic, then the non-constant bias can be emulated by element-wise add operation too.
Could we leave constant tensor optimization to implementation rather than API requirement? |
Adding emulation when inputs are not constants for these kinds of operations:
So if emulation is necessary, it would be better to by done on the client side IMO? |
@huningxin : Indeed, good to see that iOS relaxed that constness requirement later. Mingxian Yang mentions here that some stylegan models needed dynamic weights (for conv at least, not convTranspose). I still owe you all an analysis scanning through my local model collection to find any dynamic cases (hopefully before tomorrow's meeting). |
Model Scan ResultsSo I told Phillis and Reilly in the last meeting that I would scan my model cache for any of the above operators with dynamic inputs (I actually did right afterward, but it took a while longer to collate the results... 😓), and I found enough cases that I request the Chromium CoreML backend support these cases via decomposition:
They can be organized into these categories:
(1) Interestingly some weights were upcast to float32 for compute but stored as float16 in the weights (perhaps to save model space and weight upload time): Similarly, weights were quantized to save space, but inflated at runtime: (2) This conv bias is technically overridable/bindable at runtime. I could see some other models having similar dynamic overrides at runtime for fancier LoRA-like behavior: (3) There were several silly cases with pointless Identity's (into the Conv bias in this case), but I'm not too worried about these because a WebNN backend could forward the tensor such that if identity's (4) Then there were more complex cases which weren't trivial identity elimination opportunities, and maybe such cases could be resolved into something simpler and even a direct final tensor, but merely seeing that these cases exist, I have no confidence that there aren't other patterns I haven't seen: For CoreML Prelu's Some questions
Replies
Except for GRU and LSTM (which do add complexity), it should add only a little more complexity? e.g. For layernorm and conv, it's an
Isn't it only less efficient for the uncommon case, while the common case executes at full efficiency? Higher layer developers won't be able to implement the dynamic case as efficiently as the backend because (a) the backend operates closer to the metal with fewer layers (b) if a future CoreML relaxes constant constraints, then the backend can adopt that directly, whereas callers would not know and would still continue to artificially emulate (c) callers would unnecessarily decompose on backends that already support dynamic inputs directly if the input was defined as
Supporting in WebNN (rather than by clients) benefits all callers, and callers do not need to repeat code that they potentially get wrong. Only the backend knows whether the underlying API truly requires decomposition and is best poised to make that decision.
Well it certainly is more in-your-face by being visible in the code definition, but then other tensor requirements like data types and rank are not baked into the type name. Joshua's pending operator table for spec could include a 3rd row for constness (beyond data type and rank). Though, if we can handle dynamic cases in the backend fully (defaulting to the direct operator when constant and decomposing when dynamic), then maybe we don't even need this type/property externally (only internally for fast path testing). When deciding between properties vs polymorphism, notice properties honor orthogonality better than polymorphism. Right now, we're just talking about MLConstantOperand, but imagine we have a future MLAddressSwizzledOperand (a theoretical optimized operand in a tiled memory layout). Will we then need every permutation of them? (MLOperand, MLConstantOperand, MLAddressSwizzledOperand, MLConstantSwizzledTiledOperand...). Similarly, if we had (we don't, but if we did) baked data typeness into operands (MLFloat16Operand, MLFloat32Operand, MLInt8Operand ...) then it would become unwieldly quite quickly (MLFloat16Operand, MLConstantFloat16Operand, MLFloat32Operand, MLConstantFloat32Operand ...). Then if you wanted to extract orthogonal properties, (given those 4 distinct classes above: MLOperand, MLConstantOperand, MLAddressSwizzledOperand, MLConstantSwizzledTiledOperand), would you need to test all 4 cases to extract the 2 properties? Properties enable concrete types in function signatures. It's common to have debugging utility functions (like I wrote to print all the MLOperand attributes to debug Stable Diffusion and debug other early operators), and it feels cleaner to pass a concrete MLOperand to a Typescript function rather than using "is a" dynamic type checks, or since WebIDL can technically be called from C++ too (a strongly typed language), without even JS in the picture, then it's sensible to pass a concrete type and avoid RTTI |
Closing this issue according to the discussion at TPAC here: https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#08b6 Since all CoreML operators which require |
The
constant()
operator is special. Constants are effectively inputs that are known at the time of graph compilation (i.e.build()
).You may ask, why do we need a separate method when we could just pass this constant data as an
input()
?Most discussions I've seen so far about
constant()
(#614 (comment), this thread about the since-abandoned "fill sequence" overload, etc) have been concerned with optimizing performance. There are a number of compile-time optimizations a backend may perform if it knows that some data is constant.Our experience with CoreML has given us another reason:
The backend requires that a parameter to an operator must be constant
Take
conv2d()
, for example. It's defined for CoreML here. Thebias
parameter is defined as:bias: const tensor<[C_out],T>
Meanwhile, WebNN allows this to be any
MLOperand
, as long as it's of the appropriate shape and dtype:This appears to be a straightforward plumbing through of DML's interface, which does not require the
BiasTensor
to be a constant. Neither does the corresponding operator for TFLite. From what I can tell, this seems to be because these frameworks don't have a way to express that some input tensor must be const. The options are either to pass the parameter as the framework's generic representation of a Tensor - which would in practice always(?) be created from aconstant()
- or to pass the parameter as a 1D array directly. If the parameters may be large (and perhaps unbounded), the former is the more appropriate choice.To get a sense for whether this is a reasonable hypothesis, I've inspected of all† uses of the affected operators in the WebNN Samples repo repo:
operator.param
batchNormalization.mean
batchNormalization.variance
batchNormalization.scale
batchNormalization.bias
conv2d.bias
convTranspose2d.bias
gru.weight
gru.recurrentWeight
gru.bias
gru.recurrentBias
instanceNormalization.scale
instanceNormalization.bias
layerNormalization.scale
layerNormalization.bias
lstm.weight
lstm.recurrentWeight
lstm.bias
lstm.recurrentBias
lstm.peepholeWeight
prelu.slope
†This list only includes WebNN operators which trivially map to CoreML operators. WebNN operators which need to be in terms of other CoreML operators will be subject to the restrictions of those respective CoreML operators. For example, CoreML doesn't have operators for
gruCell
orlstmCell
, so these operators will need to be implemented in terms ofgru
andlstm
, respectively. These operators will in turn need many of their parameters to be const, as well††One caller of passes the result of a
reshape
... but that's only because the sample was written beforeconstant()
took anMLOperandDescriptor
. Thereshape
is just assigning dimensions to aconstant()
. Nowadays we'd just pass theconstant()
directlyRemarkably, every single instance where one of these params is used in the WebNN Samples, it was created from a
constant()
. Cool!Of course, this is not close to a comprehensive list of all models hope to run with WebNN. That being said, if there are no significant known use cases for passing any of these parameters as non-constant tensors - if their non-constness is simply a limitation in the framework and there are no useful reasons to pass non-const tensors - I think there's a reasonable argument that WebNN should require these parameters to be constants. @fwdr could you perhaps provide some more color here? :)
It seems that we have the following options to support each of these operators on CoreML:
operator.param
must be a constantMLOperand
(my tentative preference)MLConstantOperand
interface which extendsMLOperand
, and specify thatparam
takes anMLConstantOperand
MLOperand
has a "kind", as the Chromium implementation already does, and throw aTypeError
if not a "constant" kind. This may be confusing to developersoperator.param
asequence<MLNumber>
operator.param
is not a constant only on CoreMLbuild()
on Chromium, though we could conceivably make this a synchronous check on the respective builder method, especially if we have defined procedures for querying for backend-specific support (see Allow checking whether operators/types are supported for a backend before creating a graph #463)Thoughts?
The text was updated successfully, but these errors were encountered: