metal : add BF16 support #8439

ggerganov · 2024-07-11T12:41:44Z

Extend Metal backend to work with BF16 data type.

TODO:

GGML_OP_MUL_MAT_ID
~~Flash Attn~~ (in next PR)

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

robbiemu · 2024-10-13T23:52:56Z

pardon the ignorance,

what does this add bf16 support to using metal?

I see my gpu spinning making imatrices, and with llama-cli on bf16 models.

ggml-ci

* metal : fix minor string leaks (ggml/1004) * cmake : make it possible linking ggml as external lib (ggml/1003) * sync : ggml * CANN: adjust backend registry refactor. (ggerganov#10158) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR. * metal : move dequantize templates to beginning of MSL source (#0) * metal : simplify f16 and f32 dequant kernels (#0) * cuda : clear error after changing peer access (ggerganov#10153) * fix build break on arm64 linux (ggerganov#10166) This fixes the build break from the recent changes to move the CPU backend to separate files ggerganov#10144 * server : clarify /slots endpoint, add is_processing (ggerganov#10162) * server : clarify /slots endpoint, add is_processing * fix tests * ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggerganov#10167) * ggml : fix gelu tables initialization (ggerganov#10172) * Q6_K AVX improvements (ggerganov#10118) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86 * ggml : fix arch check in bf16_to_fp32 (ggerganov#10164) * llama : add <|tool_call|> formatting to Granite template (ggerganov#10177) Branch: GraniteToolCallTemplate Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * metal : add quantized FA support (ggerganov#10149) * metal : add quantized FA (vec) support ggml-ci * metal : add quantized FA (non-vec) support * metal : fix support check ggml-ci * metal : clean-up * metal : clean-up (cont) * metal : fix shared memory calc + reduce smem + comments * metal : float-correctness * metal : minor [no ci] * ggml : adjust is_first_call init value (ggerganov#10193) ggml-ci * metal : fix from ptr buffer name (ggerganov#10189) * server : remove hack for extra parallel slot (ggerganov#10187) ggml-ci * metal : add BF16 support (ggerganov#8439) * ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Merge PR (#10) (#11) (#13) Merge --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests). Updates `requests` from 2.31.0 to 2.32.2 - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.31.0...v2.32.2) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com> * Temp (#15) * metal : fix minor string leaks (ggml/1004) * cmake : make it possible linking ggml as external lib (ggml/1003) * sync : ggml * CANN: adjust backend registry refactor. (ggerganov#10158) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR. * metal : move dequantize templates to beginning of MSL source (#0) * metal : simplify f16 and f32 dequant kernels (#0) * cuda : clear error after changing peer access (ggerganov#10153) * fix build break on arm64 linux (ggerganov#10166) This fixes the build break from the recent changes to move the CPU backend to separate files ggerganov#10144 * server : clarify /slots endpoint, add is_processing (ggerganov#10162) * server : clarify /slots endpoint, add is_processing * fix tests * ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggerganov#10167) * ggml : fix gelu tables initialization (ggerganov#10172) * Q6_K AVX improvements (ggerganov#10118) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86 * ggml : fix arch check in bf16_to_fp32 (ggerganov#10164) * llama : add <|tool_call|> formatting to Granite template (ggerganov#10177) Branch: GraniteToolCallTemplate Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * metal : add quantized FA support (ggerganov#10149) * metal : add quantized FA (vec) support ggml-ci * metal : add quantized FA (non-vec) support * metal : fix support check ggml-ci * metal : clean-up * metal : clean-up (cont) * metal : fix shared memory calc + reduce smem + comments * metal : float-correctness * metal : minor [no ci] * ggml : adjust is_first_call init value (ggerganov#10193) ggml-ci * metal : fix from ptr buffer name (ggerganov#10189) * server : remove hack for extra parallel slot (ggerganov#10187) ggml-ci * metal : add BF16 support (ggerganov#8439) * ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Master1 (#17) * Merge PR (#10) (#11) (#13) Merge --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests). Updates `requests` from 2.31.0 to 2.32.2 - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.31.0...v2.32.2) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com> * Temp (#15) * metal : fix minor string leaks (ggml/1004) * cmake : make it possible linking ggml as external lib (ggml/1003) * sync : ggml * CANN: adjust backend registry refactor. (ggerganov#10158) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR. * metal : move dequantize templates to beginning of MSL source (#0) * metal : simplify f16 and f32 dequant kernels (#0) * cuda : clear error after changing peer access (ggerganov#10153) * fix build break on arm64 linux (ggerganov#10166) This fixes the build break from the recent changes to move the CPU backend to separate files ggerganov#10144 * server : clarify /slots endpoint, add is_processing (ggerganov#10162) * server : clarify /slots endpoint, add is_processing * fix tests * ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggerganov#10167) * ggml : fix gelu tables initialization (ggerganov#10172) * Q6_K AVX improvements (ggerganov#10118) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86 * ggml : fix arch check in bf16_to_fp32 (ggerganov#10164) * llama : add <|tool_call|> formatting to Granite template (ggerganov#10177) Branch: GraniteToolCallTemplate Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * metal : add quantized FA support (ggerganov#10149) * metal : add quantized FA (vec) support ggml-ci * metal : add quantized FA (non-vec) support * metal : fix support check ggml-ci * metal : clean-up * metal : clean-up (cont) * metal : fix shared memory calc + reduce smem + comments * metal : float-correctness * metal : minor [no ci] * ggml : adjust is_first_call init value (ggerganov#10193) ggml-ci * metal : fix from ptr buffer name (ggerganov#10189) * server : remove hack for extra parallel slot (ggerganov#10187) ggml-ci * metal : add BF16 support (ggerganov#8439) * ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> * Rename build.yml to build-ci.yml * build.yml * Update build-ci.yml * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Delete ggml/src/vulkan-shaders/CMakeLists.txt * Update build.yml * Update build-ci.yml * Update build-ci.yml --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support

* merge (#20) * Master1 (#17) * Merge PR (#10) (#11) (#13) Merge --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests). Updates `requests` from 2.31.0 to 2.32.2 - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.31.0...v2.32.2) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com> * Temp (#15) * metal : fix minor string leaks (ggml/1004) * cmake : make it possible linking ggml as external lib (ggml/1003) * sync : ggml * CANN: adjust backend registry refactor. (ggerganov#10158) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR. * metal : move dequantize templates to beginning of MSL source (#0) * metal : simplify f16 and f32 dequant kernels (#0) * cuda : clear error after changing peer access (ggerganov#10153) * fix build break on arm64 linux (ggerganov#10166) This fixes the build break from the recent changes to move the CPU backend to separate files ggerganov#10144 * server : clarify /slots endpoint, add is_processing (ggerganov#10162) * server : clarify /slots endpoint, add is_processing * fix tests * ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggerganov#10167) * ggml : fix gelu tables initialization (ggerganov#10172) * Q6_K AVX improvements (ggerganov#10118) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86 * ggml : fix arch check in bf16_to_fp32 (ggerganov#10164) * llama : add <|tool_call|> formatting to Granite template (ggerganov#10177) Branch: GraniteToolCallTemplate Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * metal : add quantized FA support (ggerganov#10149) * metal : add quantized FA (vec) support ggml-ci * metal : add quantized FA (non-vec) support * metal : fix support check ggml-ci * metal : clean-up * metal : clean-up (cont) * metal : fix shared memory calc + reduce smem + comments * metal : float-correctness * metal : minor [no ci] * ggml : adjust is_first_call init value (ggerganov#10193) ggml-ci * metal : fix from ptr buffer name (ggerganov#10189) * server : remove hack for extra parallel slot (ggerganov#10187) ggml-ci * metal : add BF16 support (ggerganov#8439) * ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> * Rename build.yml to build-ci.yml * build.yml * Update build-ci.yml * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Delete ggml/src/vulkan-shaders/CMakeLists.txt * Update build.yml * Update build-ci.yml * Update build-ci.yml --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> * Update build-ci.yml --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support

* Merge (#21) * merge (#20) * Master1 (#17) * Merge PR (#10) (#11) (#13) Merge --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests). Updates `requests` from 2.31.0 to 2.32.2 - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.31.0...v2.32.2) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com> * Temp (#15) * metal : fix minor string leaks (ggml/1004) * cmake : make it possible linking ggml as external lib (ggml/1003) * sync : ggml * CANN: adjust backend registry refactor. (ggerganov#10158) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR. * metal : move dequantize templates to beginning of MSL source (#0) * metal : simplify f16 and f32 dequant kernels (#0) * cuda : clear error after changing peer access (ggerganov#10153) * fix build break on arm64 linux (ggerganov#10166) This fixes the build break from the recent changes to move the CPU backend to separate files ggerganov#10144 * server : clarify /slots endpoint, add is_processing (ggerganov#10162) * server : clarify /slots endpoint, add is_processing * fix tests * ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggerganov#10167) * ggml : fix gelu tables initialization (ggerganov#10172) * Q6_K AVX improvements (ggerganov#10118) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86 * ggml : fix arch check in bf16_to_fp32 (ggerganov#10164) * llama : add <|tool_call|> formatting to Granite template (ggerganov#10177) Branch: GraniteToolCallTemplate Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * metal : add quantized FA support (ggerganov#10149) * metal : add quantized FA (vec) support ggml-ci * metal : add quantized FA (non-vec) support * metal : fix support check ggml-ci * metal : clean-up * metal : clean-up (cont) * metal : fix shared memory calc + reduce smem + comments * metal : float-correctness * metal : minor [no ci] * ggml : adjust is_first_call init value (ggerganov#10193) ggml-ci * metal : fix from ptr buffer name (ggerganov#10189) * server : remove hack for extra parallel slot (ggerganov#10187) ggml-ci * metal : add BF16 support (ggerganov#8439) * ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> * Rename build.yml to build-ci.yml * build.yml * Update build-ci.yml * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Delete ggml/src/vulkan-shaders/CMakeLists.txt * Update build.yml * Update build-ci.yml * Update build-ci.yml --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> * Update build-ci.yml --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> * Update build-ci.yml * Update build-ci.yml --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

* Temp (#23) * Merge (#21) * merge (#20) * Master1 (#17) * Merge PR (#10) (#11) (#13) Merge --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests). Updates `requests` from 2.31.0 to 2.32.2 - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.31.0...v2.32.2) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com> * Temp (#15) * metal : fix minor string leaks (ggml/1004) * cmake : make it possible linking ggml as external lib (ggml/1003) * sync : ggml * CANN: adjust backend registry refactor. (ggerganov#10158) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR. * metal : move dequantize templates to beginning of MSL source (#0) * metal : simplify f16 and f32 dequant kernels (#0) * cuda : clear error after changing peer access (ggerganov#10153) * fix build break on arm64 linux (ggerganov#10166) This fixes the build break from the recent changes to move the CPU backend to separate files ggerganov#10144 * server : clarify /slots endpoint, add is_processing (ggerganov#10162) * server : clarify /slots endpoint, add is_processing * fix tests * ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggerganov#10167) * ggml : fix gelu tables initialization (ggerganov#10172) * Q6_K AVX improvements (ggerganov#10118) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86 * ggml : fix arch check in bf16_to_fp32 (ggerganov#10164) * llama : add <|tool_call|> formatting to Granite template (ggerganov#10177) Branch: GraniteToolCallTemplate Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * metal : add quantized FA support (ggerganov#10149) * metal : add quantized FA (vec) support ggml-ci * metal : add quantized FA (non-vec) support * metal : fix support check ggml-ci * metal : clean-up * metal : clean-up (cont) * metal : fix shared memory calc + reduce smem + comments * metal : float-correctness * metal : minor [no ci] * ggml : adjust is_first_call init value (ggerganov#10193) ggml-ci * metal : fix from ptr buffer name (ggerganov#10189) * server : remove hack for extra parallel slot (ggerganov#10187) ggml-ci * metal : add BF16 support (ggerganov#8439) * ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> * Rename build.yml to build-ci.yml * build.yml * Update build-ci.yml * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Delete ggml/src/vulkan-shaders/CMakeLists.txt * Update build.yml * Update build-ci.yml * Update build-ci.yml --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> * Update build-ci.yml --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> * Update build-ci.yml * Update build-ci.yml --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> * Bump the pip group across 2 directories with 2 updates (#24) Updates the requirements on [pillow](https://github.com/python-pillow/Pillow) and [aiohttp](https://github.com/aio-libs/aiohttp) to permit the latest version. Updates `pillow` to 11.0.0 - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](python-pillow/Pillow@10.2.0...11.0.0) Updates `aiohttp` to 3.11.7 - [Release notes](https://github.com/aio-libs/aiohttp/releases) - [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst) - [Commits](aio-libs/aiohttp@v3.9.3...v3.11.7) --- updated-dependencies: - dependency-name: pillow dependency-type: direct:production dependency-group: pip - dependency-name: aiohttp dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: apicalshark <58538165+apicalshark@users.noreply.github.com> * Update build-ci.yml * Update build-ci.yml * Update build-ci.yml * Update build-ci.yml * Update build-ci.yml * Update build-ci.yml * Update build-ci.yml * Update build-ci.yml * Create docker.yml * Create python-lint.yml * Create server.yml * Update requirements.txt --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: dennyxbox890 <58538165+dennyxbox890@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Eve <139727413+netrunnereve@users.noreply.github.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>

ggerganov force-pushed the gg/metal-bf16 branch from ce21cc8 to 2aca282 Compare July 11, 2024 13:03

ggerganov mentioned this pull request Jul 12, 2024

metal : template-ify some of the kernels #8447

Merged

4 tasks

mofosyne added the Review Complexity : High Generally require indepth knowledge of LLMs or GPUs label Jul 13, 2024

ggerganov force-pushed the gg/metal-bf16 branch from 2aca282 to c40e9df Compare July 13, 2024 15:38

ggerganov mentioned this pull request Nov 4, 2024

metal : optimize FA kernels #10171

Merged

2 tasks

ggml : add initial BF16 support

6109cf1

ggml-ci

ggerganov force-pushed the gg/metal-bf16 branch from c40e9df to 6109cf1 Compare November 6, 2024 12:15

metal : add mul_mat_id BF16 support

c915d0a

ggml-ci

github-actions bot added the testing Everything test related label Nov 6, 2024

ggerganov marked this pull request as ready for review November 6, 2024 16:31

ggerganov added 5 commits November 6, 2024 18:48

metal : check for bfloat support on the Metal device

3ee077a

ggml-ci

metal : better var names [no ci]

a408f51

metal : do not build bfloat kernels when not supported

ad12269

ggml-ci

metal : try to fix BF16 support check

6969829

ggml-ci

metal : this should correctly check bfloat support

670b8db

ggerganov merged commit 5c333e0 into master Nov 6, 2024
53 checks passed

ggerganov deleted the gg/metal-bf16 branch November 6, 2024 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal : add BF16 support #8439

metal : add BF16 support #8439

ggerganov commented Jul 11, 2024 •

edited

Loading

robbiemu commented Oct 13, 2024

metal : add BF16 support #8439

metal : add BF16 support #8439

Conversation

ggerganov commented Jul 11, 2024 • edited Loading

robbiemu commented Oct 13, 2024

ggerganov commented Jul 11, 2024 •

edited

Loading