Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asking for a guide on quantization process utilizing SNPE after applying AIMET QAT/PTQ #3438

Open
chewry opened this issue Oct 28, 2024 · 5 comments
Labels
aimet-torch New feature or bug fix for AIMET PyTorch QNN Issues related to QNN question Further information is requested

Comments

@chewry
Copy link

chewry commented Oct 28, 2024

Hello authors,

Thank you for your excellent work.

I've tried utilizing AIMET to resolve a severe performance degradation issue caused by quantization while using the SNPE library. However, I've encountered the same problem with AIMET. I would like to seek advice or opinions on this matter.

Here is the previous, SNPE workflow.

## SNPE workflow
1. torch.onnx.export(full_precision_torch_model)
2. snpe-onnx-to-dlc --input_network full_precision_onnx_model
3. snpe-dlc-quantize --input_dlc full_precision_dlc_model

A trained Torch model (full precision) was converted to ONNX (using torch.onnx.export), converted to DLC (using snpe-onnx-to-dlc) and then quantized (using snpe-dlc-quantize). It works well for simple models (MobileNet), but fails for deeper models. We have confirmed through experiments that maintaining the model's activation at 16-bit also preserves its performance. However, to achieve the speed of a w8a8 model, I decided to apply AIMET. Here is the AIMET -> SNPE workflow.

## AIMET -> SNPE workflow
1. (QAT) prepare_model -> compute_encodings -> train -> QuantizationSimModel.export(AIMET_sim_model) 
or (PTQ, CLE) prepare_model -> equalize_model -> compute_encodings -> QuantizationSimModel.export(AIMET_sim_model) 
2. snpe-onnx-to-dlc --input_network AIMET_onnx_model --quantization_overrides AIMET.encodings
3. snpe-dlc-quantize --input_dlc AIMET_dlc_model

I tried to distribute the activation of the network by applying CLE (PTQ) or adjust the model and parameters suitable for quantization by applying QAT. Through AIMET QuantizationSimModel export, onnx and encodings files were generated, which were combined in snpe-onnx-to-dlc to create a DLC, and then quantized DLC was created with snpe-dlc-quantize. However, even when AIMET was applied in this way, there was still a problem where the performance was maintained before quantization but significantly decreased after quantization.

In analyzing the cause of this performance degradation, I couldn't find any official guide for converting to DLC after QuantizationSimModel.export. So there is a question about whether this SNPE conversion process works properly. If someone could provide any guidance or opinions on ONNX (or AIMET byproduct) to quantized DLC, it would be greatly appreciated.

Also, I followed the same process as in the AIMET example except for the SNPE step. If there is anything else you would like to point out, I welcome any feedback.

@quic-mangal
Copy link
Contributor

@quic-akinlawo, can you help respond to this?

@NikilXYZ
Copy link

interested in this also

@quic-akinlawo
Copy link
Contributor

Hi @chewry, can you share more details about the quantization options you used in AIMET? And also, what was the performance of the simulated model in AIMET (before you used the snpe converter model)?

For your reference, this is the guide to the snpe-dlc-quantizer: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/tools.html?product=1601111740010412#snpe-dlc-quantize

@chewry
Copy link
Author

chewry commented Nov 13, 2024

Hi @chewry, can you share more details about the quantization options you used in AIMET? And also, what was the performance of the simulated model in AIMET (before you used the snpe converter model)?

For your reference, this is the guide to the snpe-dlc-quantizer: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/tools.html?product=1601111740010412#snpe-dlc-quantize

Hello, @quic-akinlawo. Thank you for your attention.

  1. I breifly write the AIMET quantization code for our model.
  2. When I tested the exported onnx model (after QAT), it shows almost same performance with before QAT.

I got advice that add --override_params option in snpe-dlc-quantize command. It seems to reduce the degradation, but the quantization artifact still exists. I cannot distinguish the artifact comes from quantization or my faulty command.
If there is official guide, it could be very helpful.

model = prepare_model(model)

sim = QuantizationSimModel(
    model=model,
    quant_scheme=QuantScheme.training_range_learning_with_tf_init,
    dummy_input=dummy_input,
    default_output_bw=8,
    default_param_bw=8,
)

sim.compute_encodings(
    forward_pass_callback=pass_calibration_data,
    forward_pass_callback_args=use_cuda,
)

sim.model = trainer.train(sim.model)

sim.export(
    path=f"./{args.save_folder}/",
    filename_prefix=f"{args.model_name}",
    dummy_input=dummy_input,
    onnx_export_args={
        "input_names": ["input_01",],
        "output_names": ["output",],
    },
    # use_embedded_encodings=True,
)

@quic-mtuttle
Copy link
Contributor

Hi @chewry, have you evaluated the accuracy of the sim.model object in pytorch? That should give you a good idea of the performance degradation due to the quantization. The exported onnx model itself does not contain any quantization nodes (without use_embedded_encodings=True at least), which is likely why you see very close to FP performance here.

@quic-hitameht quic-hitameht added aimet-torch New feature or bug fix for AIMET PyTorch question Further information is requested QNN Issues related to QNN labels Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aimet-torch New feature or bug fix for AIMET PyTorch QNN Issues related to QNN question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants