Skip to content

ggml: aarch64: Implement SVE F32 kernels for vector functions #13843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 29, 2025

Conversation

vineelabhinav
Copy link
Contributor

@vineelabhinav vineelabhinav commented May 28, 2025

This PR adds SVE kernel support for F32 datatype specific to vector functions on ARM architecture.
This PR comes out from #13602 as a separate contribution of only vector functions suggested by @ggerganov.
Major code changes:

  1. Add SVE support for ggml_vec_dot_f32() function.
  2. Add SVE support for ggml_vec_mad_f32() function.
  3. Add SVE support for ggml_vec_scale_f32() function.

Performance

This PR improves performance by ~1.3x compared to the previous NEON-based implementation.
Model: falcon-mamba-7B-F32.gguf
Command: ./build/bin/llama-bench -m falcon-mamba-7B-F32.gguf -t 8,16,32,64 -p 128,1024 -n 0

  • Task1: Prompt Length: 128 tokens, Generated Tokens: 1 token
Threads Neon (Tokens/sec) SVE  (Tokens/sec) Ratio
8 9.24 11.81 1.28
16 17.88 22.36 1.25
32 32.54 39.34 1.21
64 53.28 60.52 1.14
  • Task2: Prompt Length: 1024 tokens, Generated Tokens: 1 token
Threads Neon (Tokens/sec) SVE  (Tokens/sec) Ratio
8 8.95 11.2 1.25
16 17.22 21.13 1.23
32 30.93 37.02 1.2
64 50.17 56.94 1.13

Perplexity

There is no change in model accuracy as a result of this PR.
Command: ./build/bin/llama-perplexity -s 0 -np 128 -t 64 -m falcon-mamba-7B-F32.gguf -c 128 -b 128 --chunks 16 -f scripts/wikitext-2-raw/wiki.test.raw

NEON SVE
7.6153 +/- 0.66890 7.6153 +/- 0.66890

Contributor: Vineel Abhinav Gottala

cc: @Vithulep

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label May 28, 2025
Comment on lines +305 to +312
#if defined(__ARM_FEATURE_SVE)
// scalar Route to scalar implementation //TODO: Write SVE code
for (int k = 0; k < GGML_VEC_MAD_UNROLL; ++k) {
for (int i = 0; i < n; ++i) {
y[i] += x[k][i]*v[k][0];
}
}
#else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the scalar route faster here compared to the GGML_SIMD branch below? If not, better remove it until there is an actual SVE implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov
Scaler route might not be faster than GGML_SIMD. We went with this approach as:

  1. If we route this function to already existing GGML_SIMD Neon, We get SIMD mappings error because if we enable SVE then all SIMD mappings will belong to SVE, where we need to use all SVE code or all Scalar code. We followed the already existing nomenclature for SIMD mappings where preprocessor directive API name is same irrespective of ISA selected. i.e "#define GGML_F32_VEC_FMA" is same for all ISA's(Neon, AVX2, AVX) , based on runtime its mapped function changes.
  2. Writing SVE for this function is out of scope of this PR.

Quick workaround for point (1.):

  1. Use different preprocessor directive API name for SVE and neon. For example : GGML_F32_VEC_FMA_SVE for SVE. This removes confusion in SIMD mappings between Neon and SVE on ARM.

Please let me know whether to proceed with above quick workaround or leave the scalar code as it is for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let's merge it like this for now.

In the future, the GGML_SIMD routes should be reimplemented into something better because the logic is becoming quite overloaded and hard to follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov Sure and thanks for the merge.

@ggerganov ggerganov merged commit 1b8fb81 into ggml-org:master May 29, 2025
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants