Add QuantizeLinear and DequantizeLinear for mixed precision #93

kpu · 2020-09-25T11:41:28Z

The current proposal has support for quantized types like tensor-quant8-asymm and some operators support them. Many networks run in mixed precision i.e. quantized output matrix multiply followed by logsoftmax in float32.

Propose adding https://github.com/onnx/onnx/blob/master/docs/Operators.md#DequantizeLinear and https://github.com/onnx/onnx/blob/master/docs/Operators.md#QuantizeLinear to make the quantized operators actually usable for many models.

The text was updated successfully, but these errors were encountered:

fdwr · 2024-08-16T06:43:48Z

(4 years later 😲) quantize and dequantizeLinear are proposed here: #375 (comment). There's still some thought needed for the block size, given DequantizeLinear-21's new attribute (whether to do something similar/different/more generic/more limited...), but it has momentum.

Also related:

wchao1115 mentioned this issue Dec 14, 2020

WebNN should support int8 quantized models #128

Closed

dontcallmedom added the opset label Mar 3, 2023

inexorabletash added the feature request label Feb 1, 2024

inexorabletash mentioned this issue Jul 12, 2024

WebML WG - TPAC 2024 agenda webmachinelearning/meetings#25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add QuantizeLinear and DequantizeLinear for mixed precision #93

Add QuantizeLinear and DequantizeLinear for mixed precision #93

kpu commented Sep 25, 2020

fdwr commented Aug 16, 2024 •

edited

Loading

Add QuantizeLinear and DequantizeLinear for mixed precision #93

Add QuantizeLinear and DequantizeLinear for mixed precision #93

Comments

kpu commented Sep 25, 2020

fdwr commented Aug 16, 2024 • edited Loading

fdwr commented Aug 16, 2024 •

edited

Loading