-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asking for a guide on quantization process utilizing SNPE after applying AIMET QAT/PTQ #3438
Comments
@quic-akinlawo, can you help respond to this? |
interested in this also |
Hi @chewry, can you share more details about the quantization options you used in AIMET? And also, what was the performance of the simulated model in AIMET (before you used the snpe converter model)? For your reference, this is the guide to the snpe-dlc-quantizer: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/tools.html?product=1601111740010412#snpe-dlc-quantize |
Hello, @quic-akinlawo. Thank you for your attention.
I got advice that add
|
Hi @chewry, have you evaluated the accuracy of the |
Hello authors,
Thank you for your excellent work.
I've tried utilizing AIMET to resolve a severe performance degradation issue caused by quantization while using the SNPE library. However, I've encountered the same problem with AIMET. I would like to seek advice or opinions on this matter.
Here is the previous, SNPE workflow.
A trained Torch model (full precision) was converted to ONNX (using torch.onnx.export), converted to DLC (using snpe-onnx-to-dlc) and then quantized (using snpe-dlc-quantize). It works well for simple models (MobileNet), but fails for deeper models. We have confirmed through experiments that maintaining the model's activation at 16-bit also preserves its performance. However, to achieve the speed of a w8a8 model, I decided to apply AIMET. Here is the AIMET -> SNPE workflow.
I tried to distribute the activation of the network by applying CLE (PTQ) or adjust the model and parameters suitable for quantization by applying QAT. Through AIMET QuantizationSimModel export, onnx and encodings files were generated, which were combined in snpe-onnx-to-dlc to create a DLC, and then quantized DLC was created with snpe-dlc-quantize. However, even when AIMET was applied in this way, there was still a problem where the performance was maintained before quantization but significantly decreased after quantization.
In analyzing the cause of this performance degradation, I couldn't find any official guide for converting to DLC after QuantizationSimModel.export. So there is a question about whether this SNPE conversion process works properly. If someone could provide any guidance or opinions on ONNX (or AIMET byproduct) to quantized DLC, it would be greatly appreciated.
Also, I followed the same process as in the AIMET example except for the SNPE step. If there is anything else you would like to point out, I welcome any feedback.
The text was updated successfully, but these errors were encountered: