Skip to content

Commit 6dcbf39

Browse files
kimishpatelfacebook-github-bot
authored andcommitted
[QNNPACK, Sparsity] Added prepacking base aarch32 kernels (pytorch#50589)
Summary: Pull Request resolved: pytorch#50589 Adds 1. Input prepacking kernel 2. Compute kernels that processes prepacked activation. Hunch is that input prepacking will help with 1. Cache locality and 2. Avoid a lot of address compute instructions. Cache locality helps mainly comes from the fact that we are doing mr=8 and nr=4. mr being 8 likely results in cache line evictions as likely cache associativity is 4. Laying out transposed activations which are blocked by mr=8 will lay all the transposed activation in one contiguous block. Downside is that now we will tranpose all the blocks regardless of them participating in compute. However it is likely that entire activation matrix participates in compute for some output block. Also add benchmark Test Plan: q8gemm-sparse-test fully-connected-test-sparse Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25925502 fbshipit-source-id: b2c36419a2c5d23b4a49f25f9ee41cee8397c3be
1 parent 47a6703 commit 6dcbf39

13 files changed

+1890
-110
lines changed

aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt

+10
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,8 @@ set(PYTORCH_QNNPACK_AARCH32_ASM_UKERNELS
216216
src/q8gemm/4x8-aarch32-neon.S
217217
src/q8gemm/4x8-dq-aarch32-neon.S
218218
src/q8gemm/4x8c2-xzp-aarch32-neon.S
219+
src/q8gemm_sparse/8x4-packA-aarch32-neon.S
220+
src/q8gemm_sparse/8x4c1x4-dq-packedA-aarch32-neon.S
219221
src/q8gemm_sparse/8x4c1x4-dq-aarch32-neon.S)
220222

221223
set(PYTORCH_QNNPACK_AARCH64_ASM_UKERNELS
@@ -809,6 +811,14 @@ if(PYTORCH_QNNPACK_BUILD_BENCHMARKS)
809811
target_compile_definitions(q8gemm-bench PRIVATE pytorch_PYTORCH_QNNPACK_BENCHMARK_GEMMLOWP=0)
810812
target_link_libraries(q8gemm-bench PRIVATE pytorch_qnnpack cpuinfo fp16 benchmark)
811813

814+
add_executable(q8gemm-sparse-bench bench/q8gemm_sparse.cc)
815+
set_target_properties(q8gemm-sparse-bench PROPERTIES
816+
CXX_STANDARD 14
817+
CXX_STANDARD_REQUIRED YES
818+
CXX_EXTENSIONS NO)
819+
target_include_directories(q8gemm-sparse-bench PRIVATE src)
820+
target_link_libraries(q8gemm-sparse-bench PRIVATE pytorch_qnnpack cpuinfo fp16 benchmark)
821+
812822
add_executable(hgemm-bench bench/hgemm.cc)
813823
set_target_properties(hgemm-bench PROPERTIES
814824
CXX_STANDARD 14

0 commit comments

Comments
 (0)