-
Notifications
You must be signed in to change notification settings - Fork 12k
ggml: aarch64: Implement SVE F32 kernels for vector functions #13843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
#if defined(__ARM_FEATURE_SVE) | ||
// scalar Route to scalar implementation //TODO: Write SVE code | ||
for (int k = 0; k < GGML_VEC_MAD_UNROLL; ++k) { | ||
for (int i = 0; i < n; ++i) { | ||
y[i] += x[k][i]*v[k][0]; | ||
} | ||
} | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the scalar route faster here compared to the GGML_SIMD
branch below? If not, better remove it until there is an actual SVE implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov
Scaler route might not be faster than GGML_SIMD. We went with this approach as:
- If we route this function to already existing GGML_SIMD Neon, We get SIMD mappings error because if we enable SVE then all SIMD mappings will belong to SVE, where we need to use all SVE code or all Scalar code. We followed the already existing nomenclature for SIMD mappings where preprocessor directive API name is same irrespective of ISA selected. i.e "#define GGML_F32_VEC_FMA" is same for all ISA's(Neon, AVX2, AVX) , based on runtime its mapped function changes.
- Writing SVE for this function is out of scope of this PR.
Quick workaround for point (1.):
- Use different preprocessor directive API name for SVE and neon. For example : GGML_F32_VEC_FMA_SVE for SVE. This removes confusion in SIMD mappings between Neon and SVE on ARM.
Please let me know whether to proceed with above quick workaround or leave the scalar code as it is for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Let's merge it like this for now.
In the future, the GGML_SIMD routes should be reimplemented into something better because the logic is becoming quite overloaded and hard to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov Sure and thanks for the merge.
This PR adds SVE kernel support for F32 datatype specific to vector functions on ARM architecture.
This PR comes out from #13602 as a separate contribution of only vector functions suggested by @ggerganov.
Major code changes:
Performance
This PR improves performance by ~1.3x compared to the previous NEON-based implementation.
Model: falcon-mamba-7B-F32.gguf
Command: ./build/bin/llama-bench -m falcon-mamba-7B-F32.gguf -t 8,16,32,64 -p 128,1024 -n 0
Perplexity
There is no change in model accuracy as a result of this PR.
Command: ./build/bin/llama-perplexity -s 0 -np 128 -t 64 -m falcon-mamba-7B-F32.gguf -c 128 -b 128 --chunks 16 -f scripts/wikitext-2-raw/wiki.test.raw
Contributor: Vineel Abhinav Gottala
cc: @Vithulep