Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OperandType of gemm / matmul return #84

Closed
anssiko opened this issue Aug 27, 2020 · 2 comments
Closed

OperandType of gemm / matmul return #84

anssiko opened this issue Aug 27, 2020 · 2 comments

Comments

@anssiko
Copy link
Member

anssiko commented Aug 27, 2020

[This issue was originally posted at https://github.com/w3c/machine-learning-workshop/issues/86 ]

@kpu wrote:

The spec says gemm returns "an Operand" (and the same thing for matmul).

If both arguments are tensor-quant8-asymm, what is the OperandType of the return? I can see use cases for tensor-int32 which is how it will actually be generated by existing hardware, tensor-quant8-asymm for a fully quantized model, or even tensor-float32 for people that have only partly quantized their model.

This matters because the spec doesn't appear to have e.g. a requantization operator to convert int32 to int8 and anyway one would need the ability to set the scaling factor used by running the model in advance to measure an appropriate scaling factor.

@kpu
Copy link

kpu commented Sep 25, 2020

Proposal: follow ONNX to have separate operators:

  • MatMul that doesn't allow int8
  • MatMulInteger that does int8 * int8 -> int32
  • QLinearMatMul that does int8 * int8 -> int8 using a rescaling factor

This is consistent with #17.

@inexorabletash
Copy link
Member

This seems obsolete or fixed. Can we close @anssiko ?

@anssiko anssiko closed this as completed Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants