feat: op perf opt #38

chraac · 2025-04-15T16:36:17Z

Related to #34

Passed test-backend-ops on sd 8gen2

unload rpcmem lib successfully
  OPT_STEP_ADAMW(type=f32,ne=[10,5,4,3]): not supported [hexagon-npu] 
  5297/5297 tests passed
  Backend hexagon-npu: �[1;32mOK�[0m

Backend 2/5: qnn-npu
  Skipping
Backend 3/5: qnn-gpu
  Skipping
Backend 4/5: qnn-cpu
  Skipping
Backend 5/5: CPU
  Skipping
5/5 backends passed
�[1;32mOK�[0m

Full log
test-backend-ops_all.debug.hexagon.android.366b8b5.log
test-backend-ops_all.debug.hexagon.android.366b8b5_logcat.log

Copilot

Pull Request Overview

This PR introduces performance optimizations and new backend implementations for the Hexagon NPU. Key changes include:

New host-side support for device initialization, memory buffering, and graph handling.
Addition of device-level implementations for tensor operations and operations such as matrix multiplication and element-wise arithmetic.
Cleanup of deprecated QNN backend definitions.

Reviewed Changes

Copilot reviewed 57 out of 59 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
ggml/src/ggml-qnn/npu/host/host.cpp	Implements host backend interfaces and device initialization routines.
ggml/src/ggml-qnn/npu/host/graph.{hpp,cpp}	Introduces graph management for tensor operations on the host side.
ggml/src/ggml-qnn/npu/host/device.{hpp,cpp}	Establishes a new device interface and support functions for the Hexagon NPU.
ggml/src/ggml-qnn/npu/host/buffer.{hpp,cpp}	Adds buffer allocation and tensor initialization using RPC memory.
ggml/src/ggml-qnn/npu/device/*	Provides implementations for tensor operations, op intrinsics, and graph execution.
ggml/include/ggml-qnn.h	Removes deprecated QNN-specific definitions.

Files not reviewed (2)

ggml/src/ggml-qnn/CMakeLists.txt: Language not supported
ggml/src/ggml-qnn/npu/CMakeLists.txt: Language not supported

ggml/src/ggml-qnn/npu/host/host.cpp

ggml/src/ggml-qnn/npu/device/op_impl.cpp

Copilot

Pull Request Overview

This PR adds new Hexagon NPU support with performance optimizations for operator execution. Key changes include:

Implementation of host and device backend interfaces for Hexagon NPU.
New tensor, buffer, and graph modules to enable op offloading and improved operator implementations.
Introduction of several op implementations, including matrix multiplication and element‐wise operations, while removing outdated qnn backend definitions.

Reviewed Changes

Copilot reviewed 57 out of 59 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
ggml/src/ggml-qnn/npu/host/host.cpp	Adds host backend functions and device proxy implementation.
ggml/src/ggml-qnn/npu/host/graph.{hpp,cpp}	Introduces host graph management for tensor operations.
ggml/src/ggml-qnn/npu/host/device.{hpp,cpp}	Implements NPU device initialization, memory, and op support.
ggml/src/ggml-qnn/npu/host/buffer.{hpp,cpp}	Provides buffer allocation and tensor initialization support.
ggml/src/ggml-qnn/npu/device/*	Defines op implementations, tensor, graph, and device APIs.
ggml/include/ggml-qnn.h	Removes legacy qnn backend definitions.

Files not reviewed (2)

ggml/src/ggml-qnn/CMakeLists.txt: Language not supported
ggml/src/ggml-qnn/npu/CMakeLists.txt: Language not supported

ggml/src/ggml-qnn/npu/device/op_mul_mat.cpp

Copilot

Pull Request Overview

This PR implements performance optimizations for Hexagon NPU operator processing by adding new device, graph, tensor, buffer, and operator implementations. Key changes include:

Introduction of new host device and backend classes for NPU initialization and operation offloading.
Implementation of host graph management and enhanced memory/buffer handling.
New operator implementations (matrix multiplication and element-wise ops) using HVX intrinsics for improved compute performance.

Reviewed Changes

Copilot reviewed 57 out of 59 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
ggml/src/ggml-qnn/npu/host/host_device.hpp	Added npu_device and npu_backend classes for NPU device handling
ggml/src/ggml-qnn/npu/host/host_device.cpp	Implements device initialization, RPC memory setup, and logging
ggml/src/ggml-qnn/npu/host/host.cpp	Defines host interface functions for device context management
ggml/src/ggml-qnn/npu/host/graph.{hpp,cpp}	Introduces host_graph class for graph creation, update, and compute
ggml/src/ggml-qnn/npu/host/buffer.{hpp,cpp}	Implements host_buffer and host_buffer_type for buffer allocation
ggml/src/ggml-qnn/npu/device/{tensor, op_*, graph, device}.{hpp,cpp}	New operator, tensor, and graph implementations with HVX intrinsics
ggml/include/ggml-qnn.h	Updates public interface by removing deprecated backend definitions

Files not reviewed (2)

ggml/src/ggml-qnn/CMakeLists.txt: Language not supported
ggml/src/ggml-qnn/npu/CMakeLists.txt: Language not supported

ggml/src/ggml-qnn/npu/device/op_impl.cpp

Copilot

Pull Request Overview

This PR introduces performance optimizations for operations on the Hexagon NPU backend. Key changes include:

New implementations and refinements for device, tensor, and operation (e.g. mul_mat, element-wise) handling.
Enhancements for RPC memory allocation, graph management, and device interfacing.
Updates to backend header interfaces and the removal of deprecated files.

Reviewed Changes

Copilot reviewed 57 out of 59 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
ggml/src/ggml-qnn/npu/host/host_device.hpp & .cpp	Introduce and initialize the npu_device and backend classes with RPC memory management.
ggml/src/ggml-qnn/npu/host/host.cpp	Implement backend device proxy and host device interfaces.
ggml/src/ggml-qnn/npu/host/graph.{hpp,cpp}	Create and update the host graph structure for tensor management.
ggml/src/ggml-qnn/npu/host/buffer.{hpp,cpp}	Add host buffer and tensor initialization using RPC memory with proper mapping.
ggml/src/ggml-qnn/npu/device/{tensor,op_mul_mat,op_impl,graph,device}.{hpp,cpp}	Provide optimized implementations for tensor operations, multiplication, and graph execution.
ggml/include/ggml-qnn.h	Update the public interface headers to reflect backend changes and remove legacy definitions.

Files not reviewed (2)

ggml/src/ggml-qnn/CMakeLists.txt: Language not supported
ggml/src/ggml-qnn/npu/CMakeLists.txt: Language not supported

Comments suppressed due to low confidence (1)

ggml/src/ggml-qnn/npu/device/op_impl.cpp:136

There appears to be a typo in the format specifier for dst.ne[i]: '%l;d' should likely be '%lld' to match the cast. Please update to ensure proper logging.

DEVICE_LOG_DEBUG("src0.ne[%zu] and dst.ne[%zu] not match: %lld vs %l;d\n", i, i, (long long) src0.ne[i], (long long) dst.ne[i]);

Copilot

Pull Request Overview

This PR introduces performance optimizations and new implementations for the Hexagon NPU backend. Key changes include:

New host-side device, graph, and buffer implementations to manage NPU resources.
Optimized operator implementations (e.g. matrix multiplication) using Hexagon HVX intrinsics.
Updates to backend and device interfaces to improve resource handling and logging.

Reviewed Changes

Copilot reviewed 57 out of 59 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
ggml/src/ggml-qnn/npu/host/host_device.hpp	New header defining the NPU device interface and associated functions.
ggml/src/ggml-qnn/npu/host/host_device.cpp	Implementation of device initialization, support checks, and logging.
ggml/src/ggml-qnn/npu/host/host.cpp	New host interface to create device proxies and manage backend context.
ggml/src/ggml-qnn/npu/host/graph.{hpp,cpp}	Graph management implementation with caching and update logic.
ggml/src/ggml-qnn/npu/host/buffer.{hpp,cpp}	Buffer and tensor allocation via RPC memory handling.
ggml/src/ggml-qnn/npu/device/{tensor.hpp,op_mul_mat.{hpp,cpp}}	New device operator implementations using Hexagon intrinsics.
ggml/src/ggml-qnn/npu/device/{op_impl.hpp,op_impl.cpp}	Operator implementation and support verification logic.
ggml/src/ggml-qnn/npu/device/device.cpp	Device open/close and tensor/graph operations for the NPU backend.
ggml/include/ggml-qnn.h	Minor cleanup in backend header (removed unused code).

Files not reviewed (2)

ggml/src/ggml-qnn/CMakeLists.txt: Language not supported
ggml/src/ggml-qnn/npu/CMakeLists.txt: Language not supported

Comments suppressed due to low confidence (1)

ggml/src/ggml-qnn/npu/host/buffer.cpp:138

[nitpick] Consider using the '%zu' format specifier for size_t instead of casting to int for clarity and consistency in the error message.

LOG_ERROR("failed to allocate rpc memory, size: %d MB\n", (int) (size / (1 << 20)));

chraac added 30 commits March 23, 2025 21:33

add op define xml

1dc284e

copy qnn libs in cmake

d418ac8

fix htp skel path

39a6b7b

add windows copy file list

89178a4

wip

976f20c

add generated package

20bd6f4

remove unused params

8f4a29d

add cmake list

420ef68

set qnn sdk and hexagon sdk path

7a54a85

wip

1191311

wip

28b82ea

fix tools version

3b19cce

fix compiling error

4d0c4f1

fix dims calc

120ae13

wip

9bb8ca5

add mulmat 2d

da962ca

wip

330a4a0

reduction

cacc5db

wip

7576290

wip

6144c92

fix compiling error in x64

c60778b

wip

6acb780

fix device description in emulator

b087d57

wip

ee25527

add flag

0d6b038

copy necessary libs

dbd9a51

wip

3607f93

load HtpPrepare first for emulator

36e1e32

enable custom op for 2d matrix

1e66c16

verify op config before add to node

206dec8

chraac added 11 commits April 16, 2025 12:19

add caps array

30858d6

wip

3cab8b1

wip

366b8b5

print support/unsupport op

ce67613

copy lldb-server for newer android sdk

c9b4391

add tensor_spec

2963c28

add assert

0395b98

fix crash when loading model

a59252f

rename cmake option

b485f19

fix name

9cd0737

fix device memory and description

aa0eb3e

chraac mentioned this pull request Apr 19, 2025

feat: update build script chraac/llama-cpp-qnn-builder#11

Merged

fix compiling error on qnn only build

ba02199

chraac requested a review from Copilot April 20, 2025 03:25

Copilot AI reviewed Apr 20, 2025

View reviewed changes

ggml/src/ggml-qnn/npu/host/host.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-qnn/npu/device/op_impl.cpp Outdated Show resolved Hide resolved

chraac requested a review from Copilot April 20, 2025 10:23

Copilot AI reviewed Apr 20, 2025

View reviewed changes

ggml/src/ggml-qnn/npu/device/op_mul_mat.cpp Outdated Show resolved Hide resolved

chraac changed the title ~~[WIP]feat: op perf opt~~ feat: op perf opt Apr 20, 2025

fix some potential UBs

45e5205

chraac requested a review from Copilot April 20, 2025 15:09

Copilot AI reviewed Apr 20, 2025

View reviewed changes

ggml/src/ggml-qnn/npu/device/op_impl.cpp Outdated Show resolved Hide resolved

chraac force-pushed the dev-op-perf-opt branch from 94b1529 to 31eb141 Compare April 21, 2025 03:45

chraac requested a review from Copilot April 21, 2025 03:52

Copilot AI reviewed Apr 21, 2025

View reviewed changes

fix comments

acbdb97

chraac force-pushed the dev-op-perf-opt branch from 31eb141 to acbdb97 Compare April 21, 2025 03:53

chraac requested a review from Copilot April 21, 2025 04:00

Copilot AI reviewed Apr 21, 2025

View reviewed changes

chraac merged commit beff5c4 into dev-refactoring Apr 21, 2025

github-project-automation bot moved this from In progress to Done in qnn backend Apr 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: op perf opt #38

feat: op perf opt #38

Uh oh!

chraac commented Apr 15, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

feat: op perf opt #38

feat: op perf opt #38

Uh oh!

Conversation

chraac commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

chraac commented Apr 15, 2025 •

edited

Loading