Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp #12326

Open
wants to merge 69 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
1f4d7d8
ggml-qnn: add Qualcomm QNN backend for GGML
zhouwg Feb 14, 2025
ae16f7e
ggml-qnn: santiy check
zhouwg Feb 15, 2025
2279d70
ggml-qnn: update script build-run-android.sh to compare peformance of…
zhouwg Feb 16, 2025
175298b
ggml-qnn: fix minor issue in test-backend-ops.cpp
zhouwg Feb 17, 2025
116fd01
ggml-qnn: merge QNN RPC feature from https://github.com/zhouwg/kantv/…
zhouwg Feb 18, 2025
dbb4612
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 18, 2025
1cf48cb
ggml-qnn: a concise approach to offload mulmat to QNN backend(sync fr…
zhouwg Feb 19, 2025
7f06ac7
ggml-qnn: remove redundant codes
zhouwg Feb 20, 2025
a1acb70
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 20, 2025
4aade6e
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 20, 2025
6c620cd
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 21, 2025
c29c102
ggml-qnn: add Qualcomm QNN backend for GGML
zhouwg Feb 14, 2025
fa8a731
ggml-qnn: merge QNN RPC feature from https://github.com/zhouwg/kantv/…
zhouwg Feb 18, 2025
ea219c3
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 18, 2025
cd1c054
ggml-qnn: a concise approach to offload mulmat to QNN backend(sync fr…
zhouwg Feb 19, 2025
845ae5e
ggml-qnn: remove redundant codes
zhouwg Feb 20, 2025
4db3ac1
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 20, 2025
d1ee631
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 20, 2025
54f1773
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 21, 2025
f46ad99
ggml-qnn: fix a minior typo in internal doc
zhouwg Feb 23, 2025
8a777da
ggml-qnn: refine function ggml_qnn_create_general_tensor() to avoid c…
zhouwg Feb 23, 2025
5b8dc7c
ggml-qnn: fix a minor typo in source code
zhouwg Feb 24, 2025
48c54d4
build: avoid ggml-qnn backend breaking other backend's builds
zhouwg Feb 24, 2025
3c68211
ggml-qnn: remove redundant codes to make PR reviewers happy
zhouwg Feb 25, 2025
7c75dbf
ggml-qnn: refine code format
zhouwg Feb 25, 2025
fce887c
ggml-qnn: offload quantized type mulmat to QNN backend
zhouwg Feb 26, 2025
f168654
ggml-qnn: refine source code structure to make code more clearly
zhouwg Feb 27, 2025
91a5ab8
ggml-qnn: enable release build with necessary logs to make reviewers …
zhouwg Feb 27, 2025
66be955
ggml-qnn: enable all quantize type with 2d mulmat
zhouwg Feb 27, 2025
7cf43e7
ggml-qnn: enable log output of GGMLQNN_LOG_INFO in command line mode …
zhouwg Feb 28, 2025
e6953ed
ggml-qnn: Windows port --- step2
zhouwg Feb 28, 2025
f5d588e
ggml-qnn: merge UT code and corresponding script from local dev branc…
zhouwg Mar 2, 2025
cfd9eff
ggml-qnn: merge ggml_qnn_mul_mat_4d from local dev branch to make wor…
zhouwg Mar 2, 2025
a2e236e
ggml-qnn: submit AI-assisted ggml_qnn_mul_mat_4d(not worked currently…
zhouwg Mar 2, 2025
50f2654
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step2
zhouwg Mar 2, 2025
00f3ca5
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step3
zhouwg Mar 2, 2025
2e57767
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step4
zhouwg Mar 2, 2025
338762c
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step5
zhouwg Mar 2, 2025
5ff927f
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step6
zhouwg Mar 2, 2025
f742a35
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step7
zhouwg Mar 2, 2025
ea5c68a
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step8
zhouwg Mar 2, 2025
5657e07
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- good in step9
zhouwg Mar 2, 2025
fa1b9f5
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- narrow down t…
zhouwg Mar 2, 2025
5cb7a60
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step10
zhouwg Mar 2, 2025
fbe93c3
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- narrow down t…
zhouwg Mar 2, 2025
19da362
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step11
zhouwg Mar 2, 2025
e84be4f
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- both ok in st…
zhouwg Mar 2, 2025
6afab4b
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 ---finalizing ver…
zhouwg Mar 2, 2025
7f14497
ggml-qnn: refine ggml_qnn_mul_mat and ggml_qnn_general_node according…
zhouwg Mar 2, 2025
e6a69d7
ggml-qnn: remove no-needed comments
zhouwg Mar 2, 2025
6a31839
ggml-qnn: Windows port --- step3
zhouwg Mar 3, 2025
43a115d
ggml-qnn: remove un-needed function
zhouwg Mar 4, 2025
a3475c0
ggml-qnn:rebase to upstream
zhouwg Mar 4, 2025
9240622
ggml-qnn: fix a minior issue during rebase to upstream
zhouwg Mar 4, 2025
311b969
ggml-qnn: update script according to https://github.com/ggml-org/llam…
zhouwg Mar 4, 2025
8a94cbc
ggml-qnn: fix a minior issue in ggmlqnn_create_general_tensor()
zhouwg Mar 4, 2025
47e1542
ggml-qnn: active member variable _device_id in class qnn_instance
zhouwg Mar 4, 2025
8129728
ggml-qnn: refine ggml_qnn_general_node and ggml_qnn_mul_mat to make c…
zhouwg Mar 4, 2025
adde5b8
ggml-qnn: Windows port --- step4
zhouwg Mar 6, 2025
175ff25
ggml-qnn: Windows port -- step5
zhouwg Mar 7, 2025
85655db
ggml-qnn: WoA(Windows on ARM) -- step6
zhouwg Mar 8, 2025
5bd8cef
ggml-qnn: rebase to upstream
zhouwg Mar 9, 2025
f8b1e7d
ggml-qnn: pr to upstream
zhouwg Mar 11, 2025
d0a2ff4
ggml-qnn: rebase to upstream
zhouwg Mar 18, 2025
0d503c7
ggml-qnn: self code-review
zhouwg Mar 18, 2025
2fbdee6
ggml-qnn: rebase upstream
zhouwg Mar 19, 2025
9fddc9e
ggml-qnn: add approach through Hexagon cDSP
zhouwg Mar 22, 2025
34452c8
ggml-qnn: refine general approach through Hexagon cDSP
zhouwg Mar 23, 2025
509c0fb
ggml-qnn: refine the entire ggml-qnn.cpp to make code more clear
zhouwg Mar 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,16 @@ set(CMAKE_WARN_UNUSED_CLI YES)

set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

if(CMAKE_SYSTEM_NAME STREQUAL "Android")
set(TARGET_SNAPDRAGON8GEN3 ON)
if(TARGET_SNAPDRAGON8GEN3)
#works fine on Snapdragon 8Gen3 with 1.5x(45+ tokens/second)-3x(70+ tokens/second) performance gain through the default ggml backend
add_definitions(-march=armv8.7-a)
add_definitions(-mcpu=cortex-x1)
add_definitions(-mtune=cortex-x1)
endif()
endif()

if (NOT XCODE AND NOT MSVC AND NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release CACHE STRING "Build type" FORCE)
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS "Debug" "Release" "MinSizeRel" "RelWithDebInfo")
Expand Down Expand Up @@ -119,6 +129,7 @@ llama_option_depr(WARNING LLAMA_RPC GGML_RPC)
llama_option_depr(WARNING LLAMA_SYCL GGML_SYCL)
llama_option_depr(WARNING LLAMA_SYCL_F16 GGML_SYCL_F16)
llama_option_depr(WARNING LLAMA_CANN GGML_CANN)
llama_option_depr(WARNING LLAMA_QNN GGML_QNN)

if (NOT MSVC)
if (LLAMA_SANITIZE_THREAD)
Expand Down
2 changes: 2 additions & 0 deletions ggml/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,7 @@ option(GGML_OPENCL_EMBED_KERNELS "ggml: embed kernels"
option(GGML_OPENCL_USE_ADRENO_KERNELS "ggml: use optimized kernels for Adreno" ON)
set (GGML_OPENCL_TARGET_VERSION "300" CACHE STRING
"gmml: OpenCL API version to target")
option(GGML_QNN "ggml: use QNN" OFF)

# toolchain for vulkan-shaders-gen
set (GGML_VULKAN_SHADERS_GEN_TOOLCHAIN "" CACHE FILEPATH "ggml: toolchain file for vulkan-shaders-gen")
Expand Down Expand Up @@ -263,6 +264,7 @@ set(GGML_PUBLIC_HEADERS
include/ggml-rpc.h
include/ggml-sycl.h
include/ggml-vulkan.h
include/ggml-qnn.h
include/gguf.h)

set_target_properties(ggml PROPERTIES PUBLIC_HEADER "${GGML_PUBLIC_HEADERS}")
Expand Down
53 changes: 53 additions & 0 deletions ggml/include/ggml-qnn.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
/*
* Copyright (c) 2023-2024 The ggml authors
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to
* deal in the Software without restriction, including without limitation the
* rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
* sell copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#pragma once

#include "ggml.h"
#include "ggml-backend.h"

#ifdef __cplusplus
extern "C" {
#endif

#define GGML_QNN_MAX_DEVICES 3
#define GGML_QNN_BACKEND_NAME "qnn"

enum QNNBackend {
QNN_BACKEND_CPU,
QNN_BACKEND_GPU,
QNN_BACKEND_NPU,
QNN_BACKEND_GGML, //"fake" QNN backend for compare performance between QNN backend and cpu backend
};

GGML_BACKEND_API ggml_backend_t ggml_backend_qnn_init(size_t dev_num, const char * qnn_lib_path);

GGML_BACKEND_API bool ggml_backend_is_qnn(ggml_backend_t backend);

GGML_BACKEND_API int ggml_backend_qnn_get_device_count(void);

GGML_BACKEND_API ggml_backend_reg_t ggml_backend_qnn_reg(void);

const char * ggml_backend_qnn_get_devname(size_t dev_num);

#ifdef __cplusplus
}
#endif
1 change: 1 addition & 0 deletions ggml/src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,7 @@ ggml_add_backend(RPC)
ggml_add_backend(SYCL)
ggml_add_backend(Vulkan)
ggml_add_backend(OpenCL)
ggml_add_backend(QNN)

foreach (target ggml-base ggml)
target_include_directories(${target} PUBLIC $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/../include> $<INSTALL_INTERFACE:include>)
Expand Down
8 changes: 8 additions & 0 deletions ggml/src/ggml-backend-reg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,10 @@
#include "ggml-kompute.h"
#endif

#ifdef GGML_USE_QNN
#include "ggml-qnn.h"
#endif

// disable C++17 deprecation warning for std::codecvt_utf8
#if defined(__clang__)
# pragma clang diagnostic push
Expand Down Expand Up @@ -187,6 +191,9 @@ struct ggml_backend_registry {
#ifdef GGML_USE_KOMPUTE
register_backend(ggml_backend_kompute_reg());
#endif
#ifdef GGML_USE_QNN
register_backend(ggml_backend_qnn_reg());
#endif
#ifdef GGML_USE_CPU
register_backend(ggml_backend_cpu_reg());
#endif
Expand Down Expand Up @@ -577,6 +584,7 @@ void ggml_backend_load_all_from_path(const char * dir_path) {
ggml_backend_load_best("vulkan", silent, dir_path);
ggml_backend_load_best("opencl", silent, dir_path);
ggml_backend_load_best("musa", silent, dir_path);
ggml_backend_load_best("qnn", silent, dir_path);
ggml_backend_load_best("cpu", silent, dir_path);
// check the environment variable GGML_BACKEND_PATH to load an out-of-tree backend
const char * backend_path = std::getenv("GGML_BACKEND_PATH");
Expand Down
59 changes: 59 additions & 0 deletions ggml/src/ggml-qnn/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
message(STATUS "Using QNN backend")
message("CMAKE_SYSTEM_NAME : ${CMAKE_SYSTEM_NAME}")

if(NOT DEFINED QNN_SDK_PATH)
message(FATAL_ERROR "QNN_SDK_PATH not defined")
endif()

if(NOT DEFINED HEXAGON_SDK_PATH)
message(FATAL_ERROR "HEXAGON_SDK_PATH not defined")
endif()

message("QNN_SDK_PATH: ${QNN_SDK_PATH}")
message("HEXAGON_SDK_PATH: ${HEXAGON_SDK_PATH}")

if(CMAKE_SYSTEM_NAME STREQUAL "Android")
find_library(LOG_LIB log)

add_library(cdsprpc
SHARED
IMPORTED)
set_target_properties(cdsprpc
PROPERTIES
IMPORTED_LOCATION
${HEXAGON_SDK_PATH}/ipc/fastrpc/remote/ship/android_aarch64/libcdsprpc.so)

set(QNN_LINK_LIBRARIES ${LOG_LIB} cdsprpc)
set(QNN_DEFAULT_LIB_SEARCH_PATH "/data/local/tmp/" CACHE STRING "customized library search path for QNN backend")

include_directories(${HEXAGON_SDK_PATH}/incs)
include_directories(${HEXAGON_SDK_PATH}/incs/stddef)
include_directories(${HEXAGON_SDK_PATH}/ipc/fastrpc/incs)
include_directories(${HEXAGON_SDK_PATH}/ipc/fastrpc/rpcmem/inc)
include_directories(${HEXAGON_SDK_PATH}/ipc/fastrpc/remote/ship/android_Debug_aarch64)
include_directories(${HEXAGON_SDK_PATH}/incs/qnx)
include_directories(${HEXAGON_SDK_PATH}/libs/common/qnx/ship/android_Debug_aarch64)
include_directories(${HEXAGON_SDK_PATH}/utils/examples)
include_directories(${HEXAGON_SDK_PATH}/ipc/fastrpc/rtld/ship/android_aarch64)
include_directories(${HEXAGON_SDK_PATH}/libs/atomic/inc)
include_directories(${HEXAGON_SDK_PATH}/libs/atomic/android_Debug_aarch64/ship)
include_directories(${CMAKE_SOURCE_DIR}/ggml/src/ggml-qnn/)
include_directories(${CMAKE_SOURCE_DIR}/ggml/src/ggml-qnn/kernels/)

elseif(CMAKE_SYSTEM_NAME STREQUAL "Windows")
set(QNN_DEFAULT_LIB_SEARCH_PATH "C:\\" CACHE STRING "customized library search path for QNN backend")
else()
message(FATAL_ERROR "QNN now only available on Android and Windows(Windows on ARM)")
endif()

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DGGML_USE_QNN")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -O3")

file(GLOB QNN_SOURCES "${CMAKE_CURRENT_LIST_DIR}/*.cpp" "${CMAKE_CURRENT_LIST_DIR}/kernels/ggmlop_ap_skel.c")
ggml_add_backend_library(ggml-qnn ${QNN_SOURCES})

target_include_directories(ggml-qnn PRIVATE ${QNN_SDK_PATH}/include/QNN ${HEXAGON_SDK_PATH} ${CMAKE_CURRENT_LIST_DIR})
target_link_libraries(ggml-qnn PRIVATE ${QNN_LINK_LIBRARIES})

string(REGEX REPLACE "/$" "" QNN_DEFAULT_LIB_SEARCH_PATH "${QNN_DEFAULT_LIB_SEARCH_PATH}")
target_compile_definitions(ggml-qnn PRIVATE QNN_DEFAULT_LIB_SEARCH_PATH="${QNN_DEFAULT_LIB_SEARCH_PATH}/")
Loading
Loading