-
Notifications
You must be signed in to change notification settings - Fork 3
feat: op perf opt #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces performance optimizations and new backend implementations for the Hexagon NPU. Key changes include:
- New host-side support for device initialization, memory buffering, and graph handling.
- Addition of device-level implementations for tensor operations and operations such as matrix multiplication and element-wise arithmetic.
- Cleanup of deprecated QNN backend definitions.
Reviewed Changes
Copilot reviewed 57 out of 59 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
ggml/src/ggml-qnn/npu/host/host.cpp | Implements host backend interfaces and device initialization routines. |
ggml/src/ggml-qnn/npu/host/graph.{hpp,cpp} | Introduces graph management for tensor operations on the host side. |
ggml/src/ggml-qnn/npu/host/device.{hpp,cpp} | Establishes a new device interface and support functions for the Hexagon NPU. |
ggml/src/ggml-qnn/npu/host/buffer.{hpp,cpp} | Adds buffer allocation and tensor initialization using RPC memory. |
ggml/src/ggml-qnn/npu/device/* | Provides implementations for tensor operations, op intrinsics, and graph execution. |
ggml/include/ggml-qnn.h | Removes deprecated QNN-specific definitions. |
Files not reviewed (2)
- ggml/src/ggml-qnn/CMakeLists.txt: Language not supported
- ggml/src/ggml-qnn/npu/CMakeLists.txt: Language not supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds new Hexagon NPU support with performance optimizations for operator execution. Key changes include:
- Implementation of host and device backend interfaces for Hexagon NPU.
- New tensor, buffer, and graph modules to enable op offloading and improved operator implementations.
- Introduction of several op implementations, including matrix multiplication and element‐wise operations, while removing outdated qnn backend definitions.
Reviewed Changes
Copilot reviewed 57 out of 59 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
ggml/src/ggml-qnn/npu/host/host.cpp | Adds host backend functions and device proxy implementation. |
ggml/src/ggml-qnn/npu/host/graph.{hpp,cpp} | Introduces host graph management for tensor operations. |
ggml/src/ggml-qnn/npu/host/device.{hpp,cpp} | Implements NPU device initialization, memory, and op support. |
ggml/src/ggml-qnn/npu/host/buffer.{hpp,cpp} | Provides buffer allocation and tensor initialization support. |
ggml/src/ggml-qnn/npu/device/* | Defines op implementations, tensor, graph, and device APIs. |
ggml/include/ggml-qnn.h | Removes legacy qnn backend definitions. |
Files not reviewed (2)
- ggml/src/ggml-qnn/CMakeLists.txt: Language not supported
- ggml/src/ggml-qnn/npu/CMakeLists.txt: Language not supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements performance optimizations for Hexagon NPU operator processing by adding new device, graph, tensor, buffer, and operator implementations. Key changes include:
- Introduction of new host device and backend classes for NPU initialization and operation offloading.
- Implementation of host graph management and enhanced memory/buffer handling.
- New operator implementations (matrix multiplication and element-wise ops) using HVX intrinsics for improved compute performance.
Reviewed Changes
Copilot reviewed 57 out of 59 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
ggml/src/ggml-qnn/npu/host/host_device.hpp | Added npu_device and npu_backend classes for NPU device handling |
ggml/src/ggml-qnn/npu/host/host_device.cpp | Implements device initialization, RPC memory setup, and logging |
ggml/src/ggml-qnn/npu/host/host.cpp | Defines host interface functions for device context management |
ggml/src/ggml-qnn/npu/host/graph.{hpp,cpp} | Introduces host_graph class for graph creation, update, and compute |
ggml/src/ggml-qnn/npu/host/buffer.{hpp,cpp} | Implements host_buffer and host_buffer_type for buffer allocation |
ggml/src/ggml-qnn/npu/device/{tensor, op_*, graph, device}.{hpp,cpp} | New operator, tensor, and graph implementations with HVX intrinsics |
ggml/include/ggml-qnn.h | Updates public interface by removing deprecated backend definitions |
Files not reviewed (2)
- ggml/src/ggml-qnn/CMakeLists.txt: Language not supported
- ggml/src/ggml-qnn/npu/CMakeLists.txt: Language not supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces performance optimizations for operations on the Hexagon NPU backend. Key changes include:
- New implementations and refinements for device, tensor, and operation (e.g. mul_mat, element-wise) handling.
- Enhancements for RPC memory allocation, graph management, and device interfacing.
- Updates to backend header interfaces and the removal of deprecated files.
Reviewed Changes
Copilot reviewed 57 out of 59 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
ggml/src/ggml-qnn/npu/host/host_device.hpp & .cpp | Introduce and initialize the npu_device and backend classes with RPC memory management. |
ggml/src/ggml-qnn/npu/host/host.cpp | Implement backend device proxy and host device interfaces. |
ggml/src/ggml-qnn/npu/host/graph.{hpp,cpp} | Create and update the host graph structure for tensor management. |
ggml/src/ggml-qnn/npu/host/buffer.{hpp,cpp} | Add host buffer and tensor initialization using RPC memory with proper mapping. |
ggml/src/ggml-qnn/npu/device/{tensor,op_mul_mat,op_impl,graph,device}.{hpp,cpp} | Provide optimized implementations for tensor operations, multiplication, and graph execution. |
ggml/include/ggml-qnn.h | Update the public interface headers to reflect backend changes and remove legacy definitions. |
Files not reviewed (2)
- ggml/src/ggml-qnn/CMakeLists.txt: Language not supported
- ggml/src/ggml-qnn/npu/CMakeLists.txt: Language not supported
Comments suppressed due to low confidence (1)
ggml/src/ggml-qnn/npu/device/op_impl.cpp:136
- There appears to be a typo in the format specifier for dst.ne[i]: '%l;d' should likely be '%lld' to match the cast. Please update to ensure proper logging.
DEVICE_LOG_DEBUG("src0.ne[%zu] and dst.ne[%zu] not match: %lld vs %l;d\n", i, i, (long long) src0.ne[i], (long long) dst.ne[i]);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces performance optimizations and new implementations for the Hexagon NPU backend. Key changes include:
- New host-side device, graph, and buffer implementations to manage NPU resources.
- Optimized operator implementations (e.g. matrix multiplication) using Hexagon HVX intrinsics.
- Updates to backend and device interfaces to improve resource handling and logging.
Reviewed Changes
Copilot reviewed 57 out of 59 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
ggml/src/ggml-qnn/npu/host/host_device.hpp | New header defining the NPU device interface and associated functions. |
ggml/src/ggml-qnn/npu/host/host_device.cpp | Implementation of device initialization, support checks, and logging. |
ggml/src/ggml-qnn/npu/host/host.cpp | New host interface to create device proxies and manage backend context. |
ggml/src/ggml-qnn/npu/host/graph.{hpp,cpp} | Graph management implementation with caching and update logic. |
ggml/src/ggml-qnn/npu/host/buffer.{hpp,cpp} | Buffer and tensor allocation via RPC memory handling. |
ggml/src/ggml-qnn/npu/device/{tensor.hpp,op_mul_mat.{hpp,cpp}} | New device operator implementations using Hexagon intrinsics. |
ggml/src/ggml-qnn/npu/device/{op_impl.hpp,op_impl.cpp} | Operator implementation and support verification logic. |
ggml/src/ggml-qnn/npu/device/device.cpp | Device open/close and tensor/graph operations for the NPU backend. |
ggml/include/ggml-qnn.h | Minor cleanup in backend header (removed unused code). |
Files not reviewed (2)
- ggml/src/ggml-qnn/CMakeLists.txt: Language not supported
- ggml/src/ggml-qnn/npu/CMakeLists.txt: Language not supported
Comments suppressed due to low confidence (1)
ggml/src/ggml-qnn/npu/host/buffer.cpp:138
- [nitpick] Consider using the '%zu' format specifier for size_t instead of casting to int for clarity and consistency in the error message.
LOG_ERROR("failed to allocate rpc memory, size: %d MB\n", (int) (size / (1 << 20)));
Related to #34
Passed
test-backend-ops
on sd 8gen2Full log
test-backend-ops_all.debug.hexagon.android.366b8b5.log
test-backend-ops_all.debug.hexagon.android.366b8b5_logcat.log