-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quantization] Any way to simulate asymmetric quantization? #665
Comments
Hi @masahi,
No, if you want to keep using If you are fine with emulating int8 computations using floating point operations -- the approach you suggested should work. The only pitfall is that the result might be slightly different result if during the computations the rounding happens ( The alternative approach would be to do two step computations (assuming that nontrivial zero point is applied to the source data tensor only):
This is conceptually how the library would implement the convolution with non-trivial zero points. However, this is much more intrusive way and actually even quite inefficient (the slowdown is >2x compared to the convolution w/o zero point). Given also the complex API the library has, I would suggest to avoid going this route :) Summary:
P.S. A nice explanation how could implementation handle the zero points efficiently could be found in gemmlowp docs. |
@emfomenk Thanks very much for the detailed answer. My use case is to convert quantized PyTorch models to TVM and run them on more backends. So using PyTorch implementation is not an option. TVM community is developing a mechanism to easily plug in external libraries like TensorRT and DNNL to their compilation pipeline. See my PR apache/tvm#4741 for example, where I demonstrate using DNNL's fused conv op from TVM. My next step is to do the same exercise for quantized ops, and for that I need to handle asymmetry. Since this is mostly demo purpose and having sort of reliable "ground truth" is more important, I don't care about performance for now. The gemmlowp approach of decomposing qconv into 4 terms is also how TVM handles asymmetry. See https://github.com/apache/incubator-tvm/blob/0755e4a58897c64d6a7ffc86bab3df45554bac7e/src/relay/qnn/op/convolution.cc#L512-L580 Decomposing and executing decomposed ops with DNNL seems like a good plan. But it would be nicer if the library can handle it automatically. Since both PyTorch and Tensorflow generate non zero zero points, I think there are good use cases. (Of course, users should be aware that it would be slower than symmetric quantization). |
Hi, from the doc https://intel.github.io/mkl-dnn/dev_guide_attributes_quantization.html, it seems DNNL quantized convolution supports only symmetric quantization. But I have a use case where I want to execute quantized conv op that comes from PyTorch, and there are some non-zero zero points.
Is there a way to simulate quantized convolution with non-zero zero points using DNNL? Performance is not too important for me right now.
Is
an good approach?
The text was updated successfully, but these errors were encountered: