Remove quantization-specific params from OperandDescriptor #44

wchao1115 · 2020-02-13T14:54:10Z

Keep OperandDescriptor scoped and versioning friendly by removing quantization specific params e.g. scale and zeropoint, while keeping OperandType enum values as a straight data type enum. Quantization-specific params could be made arguments of a new Operand making overload.

kpu · 2020-09-08T08:39:56Z

The alternative you appear to be proposing, that operators have extra quantization arguments, is worse. You end up with MXNet's 6-9 tensors just to call matrix multiplication: https://github.com/apache/incubator-mxnet/blob/master/src/operator/quantization/quantized_fully_connected.cc which is very hard to keep track of. And users do their own scaling calculations that are easy to get wrong. Carrying quantization information with the tensor is much better.

Were there a base class without quantization information and an inherited class with quantization information, that would be acceptable.

wchao1115 · 2020-09-08T22:31:17Z

The key issues in my mind with coupling quantization data with the tensor data is that, first, the quantization process itself may produce new tensors. And secondly, there are other kinds of quantization method around besides the linear function supported by most frameworks today. The artifact from the first issue is precisely why, as you pointed out, that there are more tensors needed to compute a quantized matmul operation. The scale factors and zero-point adjustments are just the artifacts for the linear quantization function. There are many other functions that would produce a different set of artifacts.

By folding these specialized artifacts from a given quantization process into the notion of a tensor, which is very specific and rather rudimentary, we are asserting that any tensor can be thought of as a bag of properties that may also carry a set of compute-specific transformation data. This means that every operator, unless being somehow exempted from it, must rationalize its own unique computation requirement with a set of transforms presented in one of its tensor operands. It is a very unsustainable situation for API longevity.

It would be much more manageable to instead dedicate additional operators that deal with a certain type of quantization function in their calculations while keeping the notion of tensor simple as it is originally intended to be, from the data structure standpoint. ONNX took this approach rather successfully in its support for linear quantization. Certainly, there are more tensor arguments going into the linear-quantized version of matmul, but that is just a natural consequence of a process that does indeed need more tensors to compute the result.

I would also point out that in a majority of post-training quantized models, not every single operator needs to be aware of quantized data, only a few do. It's common and much more efficient for a quantized model to route the quantized scalers and zero-point adjustments tensors separately from the main quantized data so that the final dequantization process (where float calculations are resurrected) can take place at the very end of the graph with all the quantized tensors combined, rather than passing them all throughout the graph and risking the unintended copying.

wchao1115 · 2020-10-07T05:56:58Z

Per PR #94.

wchao1115 mentioned this issue Feb 13, 2020

Handling unsupported OperandType #36

Closed

This was referenced Sep 7, 2020

Model Execution API #87

Closed

OperandType of gemm / matmul return w3c/machine-learning-workshop#86

Closed

wchao1115 mentioned this issue Sep 28, 2020

Update model execution API #94

Merged

wchao1115 closed this as completed Oct 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove quantization-specific params from OperandDescriptor #44

Remove quantization-specific params from OperandDescriptor #44

wchao1115 commented Feb 13, 2020

kpu commented Sep 8, 2020

wchao1115 commented Sep 8, 2020 •

edited

Loading

wchao1115 commented Oct 7, 2020

Remove quantization-specific params from OperandDescriptor #44

Remove quantization-specific params from OperandDescriptor #44

Comments

wchao1115 commented Feb 13, 2020

kpu commented Sep 8, 2020

wchao1115 commented Sep 8, 2020 • edited Loading

wchao1115 commented Oct 7, 2020

wchao1115 commented Sep 8, 2020 •

edited

Loading