Skip to content

Commit

Permalink
Prototype XNNPack gemm compiler.
Browse files Browse the repository at this point in the history
Our existing GEMM templates are becoming unmaintainable and are preventing us from quickly adding support for new types and quantization schemes. They are also too restrictive in the shapes of the generated GEMMS. This new system generates assembly and the shape is only limited by the number of SIMD registers.

Arch: for example x64 & aarch64
Isa: neondot & avx512f

Each microkernel has an arch and an isa associated with it. All shared scalar code belongs in the arch and isa specific SIMD code belongs to the isa. Isas can inherit from each other. For example, stores are common between avx512f and avx512vnni and neonfma and neondot. This eliminates lots of code duplication.

Only the inner loops (and sometimes the outer loops) vary between GEMM microkernels on the same architecture. Most of the rest of the code is identical. Therefore, this system is modular, with each ISA inheriting from the proceeding one, and only small snippets of assembly are required to add a new ISA.

Architectures supported in initial prototype:
F32: neonfma and avx512f
QD8-F32-QC8W: neondot & avx512vnni

Support for aarch32 will be added in a future change. I do not plan on supporting x86 (32 bit) since it is irrelevant as an architecture and it has only 8 general purpose and SIMD registers. The lack of registers means that data will have to be repeatedly pushed and popped from the stack, adding lots of complexity to the templates for little gain.

The generated assembly only compiles on Linux. However, only the function headers, footers and calling conventions differ between Windows and Linux. The actual assembly is identical. I manually modified the generated assembly and tested it with MSVC for both aarch64 and x64. Support for Windows will be added in a future version.

Intel syntax is used since it is portably between Linux and Windows and it is less crazy than AT&T.

PiperOrigin-RevId: 702691549
  • Loading branch information
alankelly authored and xnnpack-bot committed Dec 18, 2024
1 parent f012860 commit fc0853d
Show file tree
Hide file tree
Showing 107 changed files with 21,091 additions and 3 deletions.
13 changes: 12 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,12 @@ SET(CMAKE_CXX_EXTENSIONS NO)
# ---[ Options.
SET(XNNPACK_LIBRARY_TYPE "default" CACHE STRING "Type of library (shared, static, or default) to build")
SET_PROPERTY(CACHE XNNPACK_LIBRARY_TYPE PROPERTY STRINGS default static shared)
OPTION(XNNPACK_ENABLE_ASSEMBLY "Build XNNPACK with assembly micro-kernels" ON)
IF(CMAKE_C_COMPILER_ID STREQUAL "MSVC")
# Disable assembly when using MSVC until support is added.
OPTION(XNNPACK_ENABLE_ASSEMBLY "Build XNNPACK with assembly micro-kernels" OFF)
ELSE()
OPTION(XNNPACK_ENABLE_ASSEMBLY "Build XNNPACK with assembly micro-kernels" ON)
ENDIF()
OPTION(XNNPACK_ENABLE_MEMOPT "Build XNNPACK with optimized memory allocation scheme" ON)
OPTION(XNNPACK_ENABLE_SPARSE "Build XNNPACK with graph rewriting for sparse inference" ON)
OPTION(XNNPACK_ENABLE_GEMM_M_SPECIALIZATION "Build XNNPACK with support for selecting microkernel with different MR" ON)
Expand Down Expand Up @@ -658,6 +663,9 @@ IF(XNNPACK_TARGET_PROCESSOR MATCHES "^x86(_64)?$")
LIST(APPEND PROD_MICROKERNEL_SRCS ${PROD_F16C_MICROKERNEL_SRCS})
LIST(APPEND PROD_MICROKERNEL_SRCS ${PROD_FMA3_MICROKERNEL_SRCS})
LIST(APPEND PROD_MICROKERNEL_SRCS ${PROD_AVX2_MICROKERNEL_SRCS})
IF(XNNPACK_ENABLE_ASSEMBLY)
LIST(APPEND PROD_MICROKERNEL_SRCS ${PROD_AMD64_ASM_MICROKERNEL_SRCS})
ENDIF()
IF(XNNPACK_ENABLE_AVX512AMX)
LIST(APPEND PROD_MICROKERNEL_SRCS ${PROD_AVX512AMX_MICROKERNEL_SRCS})
ENDIF()
Expand Down Expand Up @@ -705,6 +713,9 @@ IF(XNNPACK_TARGET_PROCESSOR MATCHES "^x86(_64)?$")
LIST(APPEND NON_PROD_MICROKERNEL_SRCS ${NON_PROD_F16C_MICROKERNEL_SRCS})
LIST(APPEND NON_PROD_MICROKERNEL_SRCS ${NON_PROD_FMA3_MICROKERNEL_SRCS})
LIST(APPEND NON_PROD_MICROKERNEL_SRCS ${NON_PROD_AVX2_MICROKERNEL_SRCS})
IF(XNNPACK_ENABLE_ASSEMBLY)
LIST(APPEND NON_PROD_MICROKERNEL_SRCS ${NON_PROD_AMD64_ASM_MICROKERNEL_SRCS})
ENDIF()
IF(XNNPACK_ENABLE_AVX512AMX)
LIST(APPEND NON_PROD_MICROKERNEL_SRCS ${NON_PROD_AVX512AMX_MICROKERNEL_SRCS})
ENDIF()
Expand Down
Loading

0 comments on commit fc0853d

Please sign in to comment.