Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed-up ggml_vec_dot_q4_1() ARM_NEON #900

Merged
merged 6 commits into from
Apr 13, 2023
Merged

Speed-up ggml_vec_dot_q4_1() ARM_NEON #900

merged 6 commits into from
Apr 13, 2023

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Apr 11, 2023

Time per token for Q4_1 drops by ~5%

Also, make it build on ARMv7 Raspberry Pi 4

🤖 Generated by Copilot at 63f7ecf

Summary

🚀🔧🔄

This pull request enhances the ggml.c file to run faster and more reliably on ARM devices by using inline functions, simpler macros, and NEON optimizations.

We're sailing on the ARM, me hearties, on the ARM
We're coding for the NEON, me hearties, for the NEON
We'll inline and simplify, and loop by two and dot
And we'll make our ggml.c run faster than a shot

Walkthrough

  • Add inline functions to emulate missing NEON intrinsics and fix a wrong feature check (link)
  • Simplify the macros and scalar computations for reducing vectors by removing conditional compilation and using the emulated intrinsics (link, link, link, link, link)
  • Fix a typo in a comment (link)
  • Remove empty lines for better readability and style (link, link)
  • Modify the loop iteration and body in ggml.c to process two blocks of x and y at a time using more NEON registers and operations (link, link)

@unbounded
Copy link
Contributor

unbounded commented Apr 11, 2023

What is the reason this is gated on __ARM_FEATURE_QRDMX?

https://developer.arm.com/documentation/101028/0012/13--Advanced-SIMD--Neon--intrinsics says this define indicates the availability of SQRDMLAH and SQRDMLSH.

https://arm-software.github.io/acle/neon_intrinsics/advsimd.html doesn't list any requirements for the vaddv instructions other than A64, my impression is that they should be available if __aarch64__ is defined.

@ggerganov
Copy link
Member Author

@unbounded
It was a mistake - thanks for the suggestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants