[pull] master from ggerganov:master #165

pull · 2025-01-07T10:12:03Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

…1087) * SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 * Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6" This reverts commit f62dc45. * Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6

Remove duplicated macros, use GGML_LOG_ERROR for errors

* GGUF: C++ refactor, backend support, misc fixes remove ggml_tensor.backend update CODEOWNERS [no ci] remove gguf_get_data from API revise GGUF API data types

* fix: Vulkan shader gen binary path when cross compiling

…11117) * Disable GL_KHR_cooperative_matrix Vulkan extension if not available. * Perform Vulkan extensions checks in a more sensible order * Remove unnecessary #ifdef directive

This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for quantised int8 datatype. This change results in 10% - 70% improvement in total speed(ie all tokens/total time), across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

* arg : option to exclude arguments from specific examples ggml-ci * readme : remove old args [no ci]

* ci : pin dependency to specific version * will this fix ec?

ref: #1058

ggml-ci

* (wip) support mergekit-extracted lora * support mergekit-extract-lora * use lora->get_scale * correct comment * correct norm name & condition * add some hints

The main motivation for this change is it was not handing ctrl-c/ctrl-d correctly. Modify `read_user_input` to handle EOF, "/bye" command, and empty input cases. Introduce `get_user_input` function to manage user input loop and handle different return cases. Signed-off-by: Eric Curtin <ecurtin@redhat.com>

* Moved scripts dir and fixed pyproject.toml * updated readme * fixed README urls * bump pypi gguf to v0.14.0 * retrigger ci * empty commit - trigger ci

Signed-off-by: hydai <z54981220@gmail.com>

…11214) * cli : auto activate conversation mode if chat template is detected * add warn on bad template * update readme (writing with the help of chatgpt) * update readme (2) * do not activate -cnv for non-instruct models

I simply have overlooked message bubble's tail placement for RTL text as I use the dark mode and that isn't visible there and this fixes it.

* Refactor test-chat-template * Update test-chat-template.cpp

* vocab : add dummy tokens for "no_vocab" type ggml-ci * vocab : minor [no ci]

* SYCL: Add Gated Linear attention kernel * glahpp: add a space at the end of file * gla: Put the barrier inside the main logic loop

This commit contains a suggestion for adding the missing embd_to_audio function from tts.cpp to tts-outetts.py. This introduces a depencency numpy which I was not sure if that is acceptable or not (only PyTorch was mentioned in referened PR). Also the README has been updated with instructions to run the example with llama-server and the python script. Refs: #10784 (comment)

* RoPE: fix back, CUDA support for back + noncont. * fix comments reg. non-cont. RoPE support [no-ci]

* fix: ggml: fix vulkan-shaders-gen build The vulkan-shaders-gen target was not being built correctly in case of cross-compilation. Other outputs need to be built for the cross compile target, but vulkan-shaders-gen needs to be built for the host. * refactor: ggml: Improve vulkan-shaders-gen toolchain setup - Add GGML_SHADERS_GEN_TOOLCHAIN CMake option. - Auto-detect host toolchain if not set. * refactor: ggml: Improve vulkan-shaders-gen toolchain setup Use configure_file to generate host_toolchain.cmake from template * fix: ggml: Fix compile error Fix compile error not finding vulkan-shaders-gen * fix: vulkan-shaders-gen build and path handling Fix build issues with vulkan-shaders-gen: - Add target dependency for correct build order - Use CMAKE_HOST_SYSTEM_NAME for executable suffix - Fix MSVC output directory in host toolchain - Normalize path handling for cross-compilation * fix: improve host compiler detection in vulkan shader build Improve host compiler detection for vulkan shader generation: - Add NO_CMAKE_FIND_ROOT_PATH to all compiler searches - Consolidate compiler detection logic - Fix Windows-specific MSVC detection - Ensure correct compiler search in cross-compilation * refactor: Simplify CMake function for detecting host compiler Simplified the CMake function to improve the process of detecting the host compiler. * fix: Remove unnecessary Vulkan library linkage in CMakeLists.txt Since `vulkan-shader-gen.cpp` only requires the `glslc` executable and not the Vulkan headers or libraries, CMakeLists.txt needs to be corrected. (See: ecc93d0) * refactor: Rename host_toolchain.cmake.in - Rename host_toolchain.cmake.in to cmake/host-toolchain.cmake.in * refactor: GGML_VULKAN_SHADERS_GEN_TOOLCHAIN Rename the macro GGML_SHADERS_GEN_TOOLCHAIN to GGML_VULKAN_SHADERS_GEN_TOOLCHAIN

* ci : use -no-cnv in gguf-split tests ggml-ci * ci : use -no-cnv in requantize tests ggml-ci * scripts : fix [no ci]

* q6_k scale caching * 16 bit unpack * q4_k test (slow) * revert it * q3_k * q2_k * little stuff * try precalculating products of a and q2_k scales * Revert "try precalculating products of a and q2_k scales" This reverts commit 65110b8. * unpack should be u16, add vim swap to gitignore (about time) * better q4_k scales * q5_k * better q6_k with separate paths for all threads and partial threads in use, plus some more optimizations * q2_k better dequant * q3_k optimizations * q3_k use hmask simd from cpu avx version * make the caches happy * q3_k separate out calculation * q2_k separate out * little stuff * use calc_superblock everywhere * q2_k optimize scale calculation * more barriers

* Add SVE support for q4_K_q8_K * Update ggml/src/ggml-cpu/ggml-cpu-quants.c change to use K_SCALE_SIZE Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* llama : add `llama_model_load_from_splits` * update

* CUDA: backwards pass for misc. ops, add tests * remove restrict from pointers

* support internlm3 * fix lint

Do masking on whole dwords, fetch all scales at once.

) * vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl Shaders are based on cpy.cu. * vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32 * ggml: copy q->f32 assumes some contiguity in the destination

ggml-ci

Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: #10609

…al tokens when send message (#11270)

Add code similar to mul_mm_cm2 to force alignment of strides, to avoid a performance regression. Add noncontiguous FA tests in test-backend-ops. Fixes #11268.

qnixsynapse and others added 2 commits January 7, 2025 14:26

rpc : code cleanup (#11107)

a4dd490

Remove duplicated macros, use GGML_LOG_ERROR for errors

pull bot added the ⤵️ pull label Jan 7, 2025

github-actions bot added ggml SYCL labels Jan 7, 2025

slaren and others added 3 commits January 7, 2025 12:38

ggml-backend : only offload from host buffers (#11120)

a3d50bc

ggml-backend : only offload from host buffers (fix) (#11124)

017cc5f

GGUF: C++ refactor, backend support, misc fixes (#11030)

53ff6b9

* GGUF: C++ refactor, backend support, misc fixes remove ggml_tensor.backend update CODEOWNERS [no ci] remove gguf_get_data from API revise GGUF API data types

github-actions bot added examples testing labels Jan 7, 2025

fix: Vulkan shader gen binary path when Cross-compiling (#11096)

bec2183

* fix: Vulkan shader gen binary path when cross compiling

github-actions bot added the Vulkan label Jan 8, 2025

mbaudier and others added 2 commits January 8, 2025 09:18

Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#…

02f0430

…11117) * Disable GL_KHR_cooperative_matrix Vulkan extension if not available. * Perform Vulkan extensions checks in a more sensible order * Remove unnecessary #ifdef directive

ci : fix cmake option (#11125)

0d52a69

github-actions bot added the devops label Jan 8, 2025

amritahs-ibm and others added 2 commits January 8, 2025 12:54

arg : option to exclude arguments from specific examples (#11136)

a3c1232

* arg : option to exclude arguments from specific examples ggml-ci * readme : remove old args [no ci]

github-actions bot added the server label Jan 8, 2025

ngxson and others added 3 commits January 8, 2025 12:07

ci : pin dependency to specific version (#11137)

80ccf5d

* ci : pin dependency to specific version * will this fix ec?

ggml : allow loading backend with env variable (ggml/1059)

c792dcf

ref: #1058

sync : ggml

99a3755

github-actions bot added the script label Jan 8, 2025

ggerganov and others added 2 commits January 8, 2025 16:19

llama : avoid hardcoded QK_K (#11061)

c07d437

ggml-ci

lora : improve compat with mergekit-extract-lora (#11131)

4d2b3d8

* (wip) support mergekit-extracted lora * support mergekit-extract-lora * use lora->get_scale * correct comment * correct norm name & condition * add some hints

github-actions bot added the python label Jan 8, 2025

ngxson and others added 4 commits January 8, 2025 16:09

ci : use actions from ggml-org (#11140)

f7cd133

gguf-py : move scripts directory (#11116)

8a1d9c2

* Moved scripts dir and fixed pyproject.toml * updated readme * fixed README urls * bump pypi gguf to v0.14.0 * retrigger ci * empty commit - trigger ci

fix: add missing msg in static_assert (#11143)

8d59d91

Signed-off-by: hydai <z54981220@gmail.com>

github-actions bot added the Nvidia GPU label Jan 8, 2025

ngxson and others added 29 commits January 13, 2025 20:18

server : (UI) Improve messages bubble shape in RTL (#11220)

504af20

I simply have overlooked message bubble's tail placement for RTL text as I use the dark mode and that isn't visible there and this fixes it.

scripts : sync opencl

d00a80e

scripts : sync gguf

48e1ae0

scripts : sync gguf (cont)

a4f3f5d

sync : ggml

44d1e79

Refactor test-chat-template.cpp (#11224)

091592d

* Refactor test-chat-template * Update test-chat-template.cpp

server : Improve code snippets direction between RTL text (#11221)

c5bf0d1

vocab : add dummy tokens for "no_vocab" type (#11231)

bbf3e55

* vocab : add dummy tokens for "no_vocab" type ggml-ci * vocab : minor [no ci]

ci : add -no-cnv for tests (#11238)

b4d92a5

SYCL: Add gated linear attention kernel (#11175)

f446c2c

* SYCL: Add Gated Linear attention kernel * glahpp: add a space at the end of file * gla: Put the barrier inside the main logic loop

RoPE: fix back, CUDA support for back + noncont. (#11240)

432df2d

* RoPE: fix back, CUDA support for back + noncont. * fix comments reg. non-cont. RoPE support [no-ci]

ci : use -no-cnv in gguf-split tests (#11254)

f11cfdf

* ci : use -no-cnv in gguf-split tests ggml-ci * ci : use -no-cnv in requantize tests ggml-ci * scripts : fix [no ci]

llama : add llama_model_load_from_splits (#11255)

681149c

* llama : add `llama_model_load_from_splits` * update

CUDA: backwards pass for misc. ops, add tests (#11257)

9c8dcef

* CUDA: backwards pass for misc. ops, add tests * remove restrict from pointers

llama : add internlm3 support (#11233)

4dbc8b9

* support internlm3 * fix lint

vulkan: optimize coopmat2 q2_k dequant function (#11130)

206bc53

vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206)

466300f

Do masking on whole dwords, fetch all scales at once.

README : added kalavai to infrastructure list (#11216)

7a689c4

llama : fix deprecation message: vocabable -> vocab (#11269)

960ec65

vocab : fix double-eos check (#11273)

a133566

ggml-ci

rpc : early register backend devices (#11262)

667d728

Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: #10609

llama.android: add field formatChat to control whether to parse speci…

3edfa7d

…al tokens when send message (#11270)

vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281)

44e18ef

Add code similar to mul_mm_cm2 to force alignment of strides, to avoid a performance regression. Add noncontiguous FA tests in test-backend-ops. Fixes #11268.

teleprint-me closed this Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #165

[pull] master from ggerganov:master #165

pull bot commented Jan 7, 2025 •

edited

Loading

[pull] master from ggerganov:master #165

[pull] master from ggerganov:master #165

Conversation

pull bot commented Jan 7, 2025 • edited Loading

pull bot commented Jan 7, 2025 •

edited

Loading