Details about LPBQ Quantization in QNN Backend #16488
-
|
I'm currently learning about QNN and its LPBQ quantization in Executorch. My environment consists of a Snapdragon SM8850 chipset with HTP architecture v81 and QNN version 2.41.0.251128. When I use the low-level API(qnn_interface.graphAddNode()) to add a FullyConnected node, QNN reports the error: "FullyConnected: Block expansion encoding not supported." However, the op validation succeeds. I observed that in Executorch, all linear layers are rewritten to conv2d layers. This raises the question: Is LPBQ quantization only supported for Conv2D and not for Linear layers? Or could this be a limitation specific to my QNN version? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
|
@haowhsu-quic @winskuo-quic @winskuo-quic @shewu-quic thoughts on this? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @chenghuaWang , Based on QNN documentation, LPBQ quantization support both of Conv2D and Linear operation. Reproduce Command: Patch:
|
Beta Was this translation helpful? Give feedback.
-
|
@shewu-quic Thanks! Your patch works fine for me. Is there a way to get QNN log info during ExecuTorch processing? |
Beta Was this translation helpful? Give feedback.

Hi @chenghuaWang ,
Based on QNN documentation, LPBQ quantization support both of Conv2D and Linear operation.
You can apply this patch to run unit test for Linear with LPBQ.
Reproduce Command:
Patch: