llamafile : ppc64le MMA implementation for Q4_0. #12489

amritahs-ibm · 2025-03-21T06:39:30Z

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Make sure to read the contributing guidelines before submitting a PR

This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le ISA using MMA builtins. This patch handles matrix multiplication between quantised datatypes, block_q4_0 and block_q8_0. This change results in 5% - 50% improvement in total speed(ie all tokens/total time), across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

amritahs-ibm · 2025-03-24T05:06:35Z

@ggerganov Can you please review these changes?

amritahs-ibm · 2025-03-27T05:47:12Z

@ggerganov Can you please review this patch?

ggerganov

It would be nice to add some sort of CI for this arch in the future. If you have any ideas, let me know.

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 21, 2025

amritahs-ibm force-pushed the SGEMM_Q4 branch from 9676a80 to 894737a Compare March 21, 2025 06:41

ggerganov approved these changes Mar 27, 2025

View reviewed changes

ggerganov merged commit c7b43ab into ggml-org:master Mar 27, 2025
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llamafile : ppc64le MMA implementation for Q4_0. #12489

llamafile : ppc64le MMA implementation for Q4_0. #12489

amritahs-ibm commented Mar 21, 2025

amritahs-ibm commented Mar 24, 2025

amritahs-ibm commented Mar 27, 2025

ggerganov left a comment

llamafile : ppc64le MMA implementation for Q4_0. #12489

llamafile : ppc64le MMA implementation for Q4_0. #12489

Conversation

amritahs-ibm commented Mar 21, 2025

amritahs-ibm commented Mar 24, 2025

amritahs-ibm commented Mar 27, 2025

ggerganov left a comment

Choose a reason for hiding this comment