- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.5k
ggml: aarch64: Implement SVE F16 kernels for vector functions #15115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml: aarch64: Implement SVE F16 kernels for vector functions #15115
Conversation
| @ggerganov , request you to kindly review the PR and support for merger | 
| @ggerganov, @compilade, please review this PR. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in #15057 (review). We don't even have CI hardware to test these changes, so it's difficult to approve these.
Let's merge after you fix the editor config errors.
…rg#15115) * Added sve implementation for vec_dot_fp16 Kernel * removed white spaces * Added comment * removed white spaces * changed GGML_F16x_VEC_FMA for code consistency * Update vec.h --------- Co-authored-by: vithulep <p.m.vithule1517@gmail.com>
…ggml-org#15115)" This reverts commit a0c2b20.
…ggml-org#15115)" This reverts commit a0c2b20.
This PR adds SVE kernel support for the f16 (ggml_vec_dot_f16() Kernel) data type to reduce the time required for image encoding during LMM model (llava-v1.6-mistral) inference on ARM architecture.
Major code changes:
In vec.cpp file:
In vec.h file:
In simd-mappings.h:
Performance: Graviton3E
On Graviton3E with different threads, got 5-15% speedup on Image Encoding time for multimodal (LMM) inference.
Model: llava-v1.6-mistral-7b.Q4_K_M
Machine: Graviton3E
Command Used:
Perplexity
I have ran perplexity with the NEON(Original) and SVE (This PR) Implementation.
And below is the summary.
This correction does not appear to have any impact on accuracy.