Asking for a guide on quantization process utilizing SNPE after applying AIMET QAT/PTQ #3438

chewry · 2024-10-28T00:58:24Z

Hello authors,

Thank you for your excellent work.

I've tried utilizing AIMET to resolve a severe performance degradation issue caused by quantization while using the SNPE library. However, I've encountered the same problem with AIMET. I would like to seek advice or opinions on this matter.

Here is the previous, SNPE workflow.

## SNPE workflow
1. torch.onnx.export(full_precision_torch_model)
2. snpe-onnx-to-dlc --input_network full_precision_onnx_model
3. snpe-dlc-quantize --input_dlc full_precision_dlc_model

A trained Torch model (full precision) was converted to ONNX (using torch.onnx.export), converted to DLC (using snpe-onnx-to-dlc) and then quantized (using snpe-dlc-quantize). It works well for simple models (MobileNet), but fails for deeper models. We have confirmed through experiments that maintaining the model's activation at 16-bit also preserves its performance. However, to achieve the speed of a w8a8 model, I decided to apply AIMET. Here is the AIMET -> SNPE workflow.

## AIMET -> SNPE workflow
1. (QAT) prepare_model -> compute_encodings -> train -> QuantizationSimModel.export(AIMET_sim_model) 
or (PTQ, CLE) prepare_model -> equalize_model -> compute_encodings -> QuantizationSimModel.export(AIMET_sim_model) 
2. snpe-onnx-to-dlc --input_network AIMET_onnx_model --quantization_overrides AIMET.encodings
3. snpe-dlc-quantize --input_dlc AIMET_dlc_model

I tried to distribute the activation of the network by applying CLE (PTQ) or adjust the model and parameters suitable for quantization by applying QAT. Through AIMET QuantizationSimModel export, onnx and encodings files were generated, which were combined in snpe-onnx-to-dlc to create a DLC, and then quantized DLC was created with snpe-dlc-quantize. However, even when AIMET was applied in this way, there was still a problem where the performance was maintained before quantization but significantly decreased after quantization.

In analyzing the cause of this performance degradation, I couldn't find any official guide for converting to DLC after QuantizationSimModel.export. So there is a question about whether this SNPE conversion process works properly. If someone could provide any guidance or opinions on ONNX (or AIMET byproduct) to quantized DLC, it would be greatly appreciated.

Also, I followed the same process as in the AIMET example except for the SNPE step. If there is anything else you would like to point out, I welcome any feedback.

The text was updated successfully, but these errors were encountered:

quic-mangal · 2024-10-30T22:34:05Z

@quic-akinlawo, can you help respond to this?

NikilXYZ · 2024-11-11T22:46:01Z

interested in this also

quic-akinlawo · 2024-11-11T23:03:41Z

Hi @chewry, can you share more details about the quantization options you used in AIMET? And also, what was the performance of the simulated model in AIMET (before you used the snpe converter model)?

For your reference, this is the guide to the snpe-dlc-quantizer: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/tools.html?product=1601111740010412#snpe-dlc-quantize

chewry · 2024-11-13T06:24:20Z

Hi @chewry, can you share more details about the quantization options you used in AIMET? And also, what was the performance of the simulated model in AIMET (before you used the snpe converter model)?

For your reference, this is the guide to the snpe-dlc-quantizer: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/tools.html?product=1601111740010412#snpe-dlc-quantize

Hello, @quic-akinlawo. Thank you for your attention.

I breifly write the AIMET quantization code for our model.
When I tested the exported onnx model (after QAT), it shows almost same performance with before QAT.

I got advice that add --override_params option in snpe-dlc-quantize command. It seems to reduce the degradation, but the quantization artifact still exists. I cannot distinguish the artifact comes from quantization or my faulty command.
If there is official guide, it could be very helpful.

model = prepare_model(model)

sim = QuantizationSimModel(
    model=model,
    quant_scheme=QuantScheme.training_range_learning_with_tf_init,
    dummy_input=dummy_input,
    default_output_bw=8,
    default_param_bw=8,
)

sim.compute_encodings(
    forward_pass_callback=pass_calibration_data,
    forward_pass_callback_args=use_cuda,
)

sim.model = trainer.train(sim.model)

sim.export(
    path=f"./{args.save_folder}/",
    filename_prefix=f"{args.model_name}",
    dummy_input=dummy_input,
    onnx_export_args={
        "input_names": ["input_01",],
        "output_names": ["output",],
    },
    # use_embedded_encodings=True,
)

quic-mtuttle · 2024-11-16T00:18:08Z

Hi @chewry, have you evaluated the accuracy of the sim.model object in pytorch? That should give you a good idea of the performance degradation due to the quantization. The exported onnx model itself does not contain any quantization nodes (without use_embedded_encodings=True at least), which is likely why you see very close to FP performance here.

quic-hitameht added aimet-torch New feature or bug fix for AIMET PyTorch question Further information is requested QNN Issues related to QNN labels Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asking for a guide on quantization process utilizing SNPE after applying AIMET QAT/PTQ #3438

Asking for a guide on quantization process utilizing SNPE after applying AIMET QAT/PTQ #3438

chewry commented Oct 28, 2024 •

edited

Loading

quic-mangal commented Oct 30, 2024

NikilXYZ commented Nov 11, 2024

quic-akinlawo commented Nov 11, 2024

chewry commented Nov 13, 2024 •

edited

Loading

quic-mtuttle commented Nov 16, 2024

Asking for a guide on quantization process utilizing SNPE after applying AIMET QAT/PTQ #3438

Asking for a guide on quantization process utilizing SNPE after applying AIMET QAT/PTQ #3438

Comments

chewry commented Oct 28, 2024 • edited Loading

quic-mangal commented Oct 30, 2024

NikilXYZ commented Nov 11, 2024

quic-akinlawo commented Nov 11, 2024

chewry commented Nov 13, 2024 • edited Loading

quic-mtuttle commented Nov 16, 2024

chewry commented Oct 28, 2024 •

edited

Loading

chewry commented Nov 13, 2024 •

edited

Loading