ggml: aarch64: Implement SVE F32 kernels for vector functions #13843

vineelabhinav · 2025-05-28T07:02:30Z

This PR adds SVE kernel support for F32 datatype specific to vector functions on ARM architecture.
This PR comes out from #13602 as a separate contribution of only vector functions suggested by @ggerganov.
Major code changes:

Add SVE support for ggml_vec_dot_f32() function.
Add SVE support for ggml_vec_mad_f32() function.
Add SVE support for ggml_vec_scale_f32() function.

Performance

This PR improves performance by ~1.3x compared to the previous NEON-based implementation.
Model: falcon-mamba-7B-F32.gguf
Command: ./build/bin/llama-bench -m falcon-mamba-7B-F32.gguf -t 8,16,32,64 -p 128,1024 -n 0

Task1: Prompt Length: 128 tokens, Generated Tokens: 1 token

Threads	Neon (Tokens/sec)	SVE (Tokens/sec)	Ratio
8	9.24	11.81	1.28
16	17.88	22.36	1.25
32	32.54	39.34	1.21
64	53.28	60.52	1.14

Task2: Prompt Length: 1024 tokens, Generated Tokens: 1 token

Threads	Neon (Tokens/sec)	SVE (Tokens/sec)	Ratio
8	8.95	11.2	1.25
16	17.22	21.13	1.23
32	30.93	37.02	1.2
64	50.17	56.94	1.13

Perplexity

There is no change in model accuracy as a result of this PR.
Command: ./build/bin/llama-perplexity -s 0 -np 128 -t 64 -m falcon-mamba-7B-F32.gguf -c 128 -b 128 --chunks 16 -f scripts/wikitext-2-raw/wiki.test.raw

NEON	SVE
7.6153 +/- 0.66890	7.6153 +/- 0.66890

Contributor: Vineel Abhinav Gottala

cc: @Vithulep

ggerganov · 2025-05-28T12:56:54Z

ggml/src/ggml-cpu/vec.h

+    #if defined(__ARM_FEATURE_SVE)
+        // scalar Route to scalar implementation       //TODO: Write SVE code
+        for (int k = 0; k < GGML_VEC_MAD_UNROLL; ++k) {
+            for (int i = 0; i < n; ++i) {
+                y[i] += x[k][i]*v[k][0];
+            }
+        }
+    #else


Is the scalar route faster here compared to the GGML_SIMD branch below? If not, better remove it until there is an actual SVE implementation.

@ggerganov
Scaler route might not be faster than GGML_SIMD. We went with this approach as:

If we route this function to already existing GGML_SIMD Neon, We get SIMD mappings error because if we enable SVE then all SIMD mappings will belong to SVE, where we need to use all SVE code or all Scalar code. We followed the already existing nomenclature for SIMD mappings where preprocessor directive API name is same irrespective of ISA selected. i.e "#define GGML_F32_VEC_FMA" is same for all ISA's(Neon, AVX2, AVX) , based on runtime its mapped function changes.

Writing SVE for this function is out of scope of this PR.

Quick workaround for point (1.):

Use different preprocessor directive API name for SVE and neon. For example : GGML_F32_VEC_FMA_SVE for SVE. This removes confusion in SIMD mappings between Neon and SVE on ARM.

Please let me know whether to proceed with above quick workaround or leave the scalar code as it is for now?

I see. Let's merge it like this for now.

In the future, the GGML_SIMD routes should be reimplemented into something better because the logic is becoming quite overloaded and hard to follow.

@ggerganov Sure and thanks for the merge.

vineelabhinav added 7 commits May 17, 2025 08:58

F32-Mamba-SVE

8581c89

F32-Mamba-SVE

55b5545

Resolve test errors-1

b4ab67c

Resolve test errors-2

5cd9e35

F32-vec-SVE

b6ed111

F32-vec-SVE

64bf15f

F32-vec-SVE

c0344b8

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label May 28, 2025

vineelabhinav mentioned this pull request May 28, 2025

ggml: aarch64: Implement SVE F32 kernels for Mamba Model #13602

Closed

ggerganov approved these changes May 28, 2025

View reviewed changes

ggerganov merged commit 1b8fb81 into ggml-org:master May 29, 2025
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml: aarch64: Implement SVE F32 kernels for vector functions #13843

ggml: aarch64: Implement SVE F32 kernels for vector functions #13843

Uh oh!

vineelabhinav commented May 28, 2025 •

edited

Loading

Uh oh!

ggerganov May 28, 2025

Uh oh!

vineelabhinav May 28, 2025

Uh oh!

ggerganov May 29, 2025

Uh oh!

vineelabhinav May 29, 2025

Uh oh!

Uh oh!

Uh oh!

ggml: aarch64: Implement SVE F32 kernels for vector functions #13843

ggml: aarch64: Implement SVE F32 kernels for vector functions #13843

Uh oh!

Conversation

vineelabhinav commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Perplexity

Uh oh!

ggerganov May 28, 2025

Choose a reason for hiding this comment

Uh oh!

vineelabhinav May 28, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov May 29, 2025

Choose a reason for hiding this comment

Uh oh!

vineelabhinav May 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vineelabhinav commented May 28, 2025 •

edited

Loading