-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OperandType of gemm / matmul return #86
Comments
Thanks for your comment. To ensure this detailed spec feedback is addressed appropriately, I've transferred the issue to the WebNN API specification repo where the API design work happens: |
@kpu this issue has previously been discussed in webmachinelearning/webnn#44. I will be refactoring quantization-related procedural data from the |
@wchao1115 The issue you referenced, webmachinelearning/webnn#44, is about how the quantization scaling factor and zeropoint should be included in As the title of this issue says, this is about the This has nothing to do with how the scaling factor is encoded in |
The spec says
gemm
returns "an Operand" (and the same thing formatmul
).If both arguments are
tensor-quant8-asymm
, what is theOperandType
of the return? I can see use cases fortensor-int32
which is how it will actually be generated by existing hardware,tensor-quant8-asymm
for a fully quantized model, or eventensor-float32
for people that have only partly quantized their model.This matters because the spec doesn't appear to have e.g. a requantization operator to convert int32 to int8 and anyway one would need the ability to set the scaling factor used by running the model in advance to measure an appropriate scaling factor.
The text was updated successfully, but these errors were encountered: