-
Notifications
You must be signed in to change notification settings - Fork 12.2k
ggml-cpu: enable IBM NNPA Vector Intrinsics #14303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
This reverts commit 23f3f5e. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
This reverts commit a88843a. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
There are also the functions |
Good catch! Let me look into it and merge it with this PR where possible :) |
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
This reverts commit 1be4514. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Hey @slaren, I've successfully implemented our SIMD into the suggested functions but it resides within By any chance do you have an idea if those 2 functions are supposed to be moved into Edit: Ignore what I wrote above, I was looking at the wrong functions all these while. My bad. |
Closing, superseded by #14317 |
This pull request aims to enable the IBM NNPA instruction set for IBM z16 mainframes and later on the s390x platform. This code change is mainly targeted at FP16 -> FP32 or FP32 -> FP16 data conversions.
Verification
To ensure that this implementation did not break anything, the NNPA instruction set has been tested on the following models:
Performance Results
I will be using IBM Granite 3.3 for the performance tests. We notice a performance improvement of roughly 31.21% for F16 Token Generation, which is the expected outcome.
Before NNPA Instruction Set
After NNPA Instruction Set
Note
Tests were conducted on an IBM z16 Mainframe with 2 IFLs (4 vCores) and 64 GB Memory on z/VM (Type-2)
Please review this pull request and consider merging into the main repository. Thank you!