Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OperandType of gemm / matmul return #86

Closed
kpu opened this issue Aug 27, 2020 · 4 comments
Closed

OperandType of gemm / matmul return #86

kpu opened this issue Aug 27, 2020 · 4 comments

Comments

@kpu
Copy link

kpu commented Aug 27, 2020

The spec says gemm returns "an Operand" (and the same thing for matmul).

If both arguments are tensor-quant8-asymm, what is the OperandType of the return? I can see use cases for tensor-int32 which is how it will actually be generated by existing hardware, tensor-quant8-asymm for a fully quantized model, or even tensor-float32 for people that have only partly quantized their model.

This matters because the spec doesn't appear to have e.g. a requantization operator to convert int32 to int8 and anyway one would need the ability to set the scaling factor used by running the model in advance to measure an appropriate scaling factor.

@anssiko
Copy link
Member

anssiko commented Aug 27, 2020

Thanks for your comment. To ensure this detailed spec feedback is addressed appropriately, I've transferred the issue to the WebNN API specification repo where the API design work happens:
webmachinelearning/webnn#84

@anssiko anssiko closed this as completed Aug 27, 2020
@wchao1115
Copy link

@kpu this issue has previously been discussed in webmachinelearning/webnn#44. I will be refactoring quantization-related procedural data from the OperandDescriptor type as we incorporate aspects of quantization work into the operator API.

@kpu
Copy link
Author

kpu commented Sep 8, 2020

@wchao1115 The issue you referenced, webmachinelearning/webnn#44, is about how the quantization scaling factor and zeropoint should be included in OperandDescriptor.

As the title of this issue says, this is about the OperandType of the return value from matmul. Should multiplying int8 by int8 return float32, int32, or include a scaling factor to go to int8?

This has nothing to do with how the scaling factor is encoded in OperandDescriptor (and your suggestion that it not be).

@wchao1115
Copy link

@kpu you are right that they are not the same issue. I only meant to point out that the issue around how to properly support quantization is not fully resolved, and that #44 is related to that whole conversation. I didn't mean to suggest that they are the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants