merge from zhouwg ggml-hexagon #68

l3utterfly · 2025-06-03T07:18:44Z

No description provided.

ggml-ci

* common: update requirements.txt to include pytorch nightly for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * common: fix torch installation via pip for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* convert ok, load ok * warmup ok * test * still does not work? * fix padding * temporary give up * fix merge conflict * build_ultravox() * rm test * fix merge conflict * add necessary mtmd APIs * first working version (only 4s of audio) * will this monster compile? * fix compile * please compile * fPIC * fix windows * various fixes * clean up audio_helpers * fix conversion * add some debug stuff * long audio input ok * adapt the api * add --audio arg * final touch UX * add miniaudio to readme * fix typo * refactor kv metadata * mtmd_default_marker()

ggml-ci

* release : fix windows hip release * make single hip release with multiple targets

…upport it (ggml-org#13696)

Reuse the f16/f32 copy shaders, and just scale the number of elements according to the type size.

* [CANN]Support MUL_MAT_ID Q8 && Q4 Signed-off-by: noemotiovon <757486878@qq.com> * codestyle adjustment Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>

* server : support audio input * add audio support on webui

ggml-ci

* ggml : add ggml_gelu_erf() CUDA kernel * missing semicolon

…gml-org#12379) * add common_json w/ support for truncated json healing * add common_chat_msg_diff * partial common_chat_parse * refactor parser w/ optionals * server: wire chat diffs in stream mode * fix trigger of thinking models (must happen after thoughts are closed) * fix functionary v3.2 raw python! * rename: common_chat_syntax (now contains format) * rm common_regex.at_start * don't return empty <think></think> * accommodate yet another deepseek r1 distill fantasy syntax (`<｜tool▁calls｜>`) * fix QwQ 32B tool call parsing after thoughts (hermes2) * better logs for grammar triggers * consume spaces after parse_json_tool_calls * fix required tool calls w/ thinking models that have pre-opened thinking tags * fix thinking model's initial trigger + test qwq's template * run most test_tool_call tests in stream + non-stream modes * make functionary v3.2 parsing more strict (differentiate first match from others) * send final diff from server, to close off raw python arguments * support partial content streaming in Generic mode * tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5) * Update function-calling.md * Update tool_bench.py * chat-parser: remove input from exception (llm output may contain PII) --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>

…-org#13752) Temporarily reverted due to failing fp16 DIV operation This reverts commit 02cdd2d. ggml-ci

Co-authored-by: ochafik <ochafik@google.com>

* Multimodal: Added Moondream2 model and fixed ggml.org link * Apply suggestions from code review --------- Co-authored-by: name <none@none.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* mtmd : add Qwen2-Audio support * small clean up * update discussion link * clarify mtmd_get_output_embd * clarification in multimodal.md * fix ultravox bug * ggml_cont

* kv-cache : rework kv_cell ggml-ci * kv-cells : use "shift" instead of "delta" consistently ggml-ci * llama : add llama_max_parallel_sequences() ggml-ci * kv-cells : update comments [no ci] * context : fail upon construction if sequences exceed max value ggml-ci * kv-cells : get_pos() -> pos_get() + comments ggml-ci * kv-cells : fix tracking of "used" cells ggml-ci

…orks in a standard Android APP)

…antv

…roduced in kantv-ai/kantv#281)

ggerganov and others added 30 commits May 22, 2025 16:33

server : pad small embedding batches (ggml-org#13692)

cc74d5b

ggml-ci

tts : fix n_ubatch + make WavTokenizer cache-less (ggml-org#13713)

8a1d206

ggml-ci

release : fix windows hip release (ggml-org#13707)

3079e9a

* release : fix windows hip release * make single hip release with multiple targets

use LOG_WARN to replace std::cerr (ggml-org#13657)

a127ff1

vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't s…

c10ed6c

…upport it (ggml-org#13696)

vulkan: support CPY from any type to itself (ggml-org#13695)

1dcd019

Reuse the f16/f32 copy shaders, and just scale the number of elements according to the type size.

ggml : fix the order of ggml_unary_op (ggml-org#13718)

e16c473

CANN: Support MUL_MAT_ID for q8_0 and q4_0 (ggml-org#13705)

faaaff5

* [CANN]Support MUL_MAT_ID Q8 && Q4 Signed-off-by: noemotiovon <757486878@qq.com> * codestyle adjustment Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>

server : support audio input (ggml-org#13714)

9ecf3e6

* server : support audio input * add audio support on webui

llama : allow custom list of swa_layers (ggml-org#13726)

8a2afb7

hparams : initialize arrays (ggml-org#13728)

d13d0f6

ggml-ci

ci : add winget package updater (ggml-org#13732)

a70a8a6

ci : enable winget package updates (ggml-org#13734)

b775345

CUDA: fix race condition in FA vector kernels (ggml-org#13742)

ffd0eae

vocab : fix ugm tokenizer precision (ggml-org#13743)

c3a2624

ggml : add ggml_gelu_erf() CUDA kernel (ggml-org#13719)

4c32832

* ggml : add ggml_gelu_erf() CUDA kernel * missing semicolon

Move GLM4 f32 attention fix to the correct function (ggml-org#13750)

259469c

ggml-cpu : set openmp wait time if not set (ggml-org#13758)

2bd1b30

releases : enable openmp in windows cpu backend build (ggml-org#13756)

17fc817

releases : bundle llvm omp library in windows release (ggml-org#13763)

a2d02d5

SYCL: revert "sycl: simplify bin_bcast_kernel (ggml-org#13383)" (ggml…

515fdbf

…-org#13752) Temporarily reverted due to failing fp16 DIV operation This reverts commit 02cdd2d. ggml-ci

llama : add support for Qwen3 MoE tied word embeddings (ggml-org#13768)

4032ca4

server: fix/test add_generation_prompt (ggml-org#13770)

d785f9c

Co-authored-by: ochafik <ochafik@google.com>

docs : add Moondream2 pre-quantized link (ggml-org#13745)

a08c1d2

* Multimodal: Added Moondream2 model and fixed ggml.org link * Apply suggestions from code review --------- Co-authored-by: name <none@none.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

mtmd : add support for Qwen2-Audio and SeaLLM-Audio (ggml-org#13760)

40aaa8a

* mtmd : add Qwen2-Audio support * small clean up * update discussion link * clarify mtmd_get_output_embd * clarification in multimodal.md * fix ultravox bug * ggml_cont

rpc : Fix build on OpenBSD (ggml-org#13541)

c508256

zhouwg added 16 commits June 1, 2025 11:02

ggml-dsp: refine logic of thread_counts

e3a3d2c

ggml-hexagon: release v1.06 and ready for code review

c629118

ggml-dsp: make GGML_OP_ADD more faster on cDSP side

4f49c7a

ggml-hexagon: sync from project kantv(make ggml-hexagon backend can w…

00b5d44

…orks in a standard Android APP)

sync with upstream llama.cpp and sync ggml-hexagon.cpp from project k…

cbda1c8

…antv

sync with upstream

df64fef

sync with upstream

6f6cd17

ggml-hexagon: upgrade QNN SDK to v2.34.0.250424

987e959

sync with upstream

2c50925

ggml-hexagon: sync from project kantv(fix a long-term issue which int…

915e31e

…roduced in kantv-ai/kantv#281)

ggml-hexagon: sync with upstream llama.cpp

26e27c9

build: enable self-contained-build to simplify workflow

ebbdc41

sync with upstream

24a5e69

add prebuilt binary libggmlop-skel.so

5b21435

refine ggml-hexagon.cfg for the prebuilt binary libggmlop-skel.so

6962ac6

refine scripts to avoid confusion

0c53100

l3utterfly merged commit fcb4b60 into l3utterfly:ggml-hexagon Jun 3, 2025
49 checks passed

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing build examples devops python server ggml Apple Metal script labels Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge from zhouwg ggml-hexagon #68

merge from zhouwg ggml-hexagon #68

Uh oh!

l3utterfly commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

merge from zhouwg ggml-hexagon #68

merge from zhouwg ggml-hexagon #68

Uh oh!

Conversation

l3utterfly commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!