[pull] master from ggerganov:master #101

pull · 2024-05-23T16:25:04Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* ggml : drop support for QK_K=64 ggml-ci * opencl : restore QK_K=256 define

ggml-ci

…NeoX base models) (#7461) * convert-hf : add conversion of bloom-style qkv tensor to gpt-style qkv (code borrowed from BloomModel) * llama : add inference support for LLM_ARCH_GPTNEOX * llama : add model types for every Pythia variant and GPT-NeoX Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

* ci : start using Pythia models over OpenLlama ggml-ci * ci : disable q2_k ppl tests * ci : use convert-hf-to-gguf.py * ci : update gg_get_model * ci : fix convert outfile name ggml-ci * llama : gptneox arch use F32 attn prec ggml-ci

* llama : add getters for n_threads/n_threads_batch This commit adds two new functions to the llama API. The functions can be used to get the number of threads used for generating a single token and the number of threads used for prompt and batch processing (multiple tokens). The motivation for this is that we want to be able to get the number of threads that the a context is using. The main use case is for a testing/verification that the number of threads is set correctly. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! llama : add getters for n_threads/n_threads_batch Rename the getters to llama_n_threads and llama_n_threads_batch. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* Fix phi3 template matching vs zephyr * Add regression test for new phi3 chat template * Implement review suggestions * Fix phi3 jinja test templates & match by <|end|> * Apply suggestion Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Add all phi3 template variants in tests * Remove unneeded message trimming Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Fix tests to not expect trimmed messages --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

ggerganov and others added 14 commits May 23, 2024 09:43

main : minor (#7462)

fbf777d

Update vulkan rope implementation to support frequency factors (#7475)

1b1e27c

ggml : drop support for QK_K=64 (#7473)

e84b71c

* ggml : drop support for QK_K=64 ggml-ci * opencl : restore QK_K=256 define

ggml : remove ggml_flash_attn and ggml_flash_ff (#7463)

d48c88c

ggml-ci

labeler.yml: add embedding label detector [no ci] (#7482)

152da28

llama : rename n_ctx -> cache.size, less confusing (#0)

a61a94e

readme : add GPT-NeoX + Pythia to the list of supported models (#7491)

dacfceb

ci : use Pythia models instead of OpenLlama (#7470)

55ac3b7

* ci : start using Pythia models over OpenLlama ggml-ci * ci : disable q2_k ppl tests * ci : use convert-hf-to-gguf.py * ci : update gg_get_model * ci : fix convert outfile name ggml-ci * llama : gptneox arch use F32 attn prec ggml-ci

readme : add Bunny in supported models [no ci] (#7469)

8b94e79

ggml : silence UB sanitizer error during iq2_xxs quantization (#0)

1debe72

readme : remove trailing space (#7469)

74f33ad

github-actions bot added examples devops python ggml Vulkan SYCL Nvidia GPU testing build labels May 23, 2024

teleprint-me closed this May 23, 2024

pull bot removed examples devops python ggml Vulkan SYCL labels May 23, 2024

pull bot added ⤵️ pull and removed Nvidia GPU testing build labels May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #101

[pull] master from ggerganov:master #101

pull bot commented May 23, 2024 •

edited

Loading

[pull] master from ggerganov:master #101

[pull] master from ggerganov:master #101

Conversation

pull bot commented May 23, 2024 • edited Loading

pull bot commented May 23, 2024 •

edited

Loading