Releases · ggerganov/llama.cpp

13 Jan 16:43

39509fb

b4474 Latest

Latest

cuda : CUDA Graph Compute Function Refactor (precursor for performanc…

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-01-13T16:43:12Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-01-13T16:43:22Z
llama-b4474-bin-macos-arm64.zip

13 MB 2025-01-13T16:43:35Z
llama-b4474-bin-macos-x64.zip

13.9 MB 2025-01-13T16:43:35Z
llama-b4474-bin-ubuntu-x64.zip

15.8 MB 2025-01-13T16:43:36Z
llama-b4474-bin-win-avx-x64.zip

9.83 MB 2025-01-13T16:43:38Z
llama-b4474-bin-win-avx2-x64.zip

9.83 MB 2025-01-13T16:43:39Z
llama-b4474-bin-win-avx512-x64.zip

9.85 MB 2025-01-13T16:43:39Z
llama-b4474-bin-win-cuda-cu11.7-x64.zip

147 MB 2025-01-13T16:43:41Z
llama-b4474-bin-win-cuda-cu12.4-x64.zip

147 MB 2025-01-13T16:43:46Z
Source code (zip)

2025-01-13T15:45:53Z
Source code (tar.gz)

2025-01-13T15:45:53Z

13 Jan 14:00

github-actions

b4468

8f70fc3

b4468

llama : remove 'd' from bad special token log (#11212)

This commit removes the 'd' from the log message in llama-vocab.cpp
when logging a bad special token.

The motivation for this is that currently the output can look something
like the following:
```console
load: bad special token:
    'tokenizer.ggml.image_token_id' = 128256d, using default id -1
```

Assets 23

13 Jan 12:54

github-actions

b4467

1244cdc

b4467

ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL…

Assets 23

12 Jan 19:00

github-actions

b4466

924518e

b4466

Reset color before we exit (#11205)

We don't want colors to leak post termination of llama-run.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>

Assets 23

12 Jan 13:52

github-actions

b4465

9a48399

b4465

llama : fix chat template gguf key (#11201)

Assets 23

12 Jan 10:52

github-actions

b4464

08f10f6

b4464

llama : remove notion of CLS token (#11064)

ggml-ci

Assets 23

10 Jan 06:22

github-actions

b4458

c3f9d25

b4458

Vulkan: Fix float16 use on devices without float16 support + fix subg…

Assets 23

10 Jan 02:48

github-actions

b4457

ee7136c

b4457

llama: add support for QRWKV6 model architecture (#11001)

llama: add support for QRWKV6 model architecture (#11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix some typos

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* code format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix cuda warning

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update README.md

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net>

Assets 23

10 Jan 00:52

github-actions

b4456

c6860cc

b4456

SYCL: Refactor ggml_sycl_compute_forward (#11121)

* SYCL: refactor ggml_sycl_compute_forward

* SYCL: add back GGML_USED(dst) to ggml_sycl_cpy

* SYCL: add function name to noop debug

* SYCL: Some device info print refactoring and add details of XMX availability

Assets 23

09 Jan 10:57

github-actions

b4453

f8feb4b

b4453

model: Add support for PhiMoE arch (#11003)

* model: support phimoe

* python linter

* doc: minor

Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>

* doc: minor

Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>

* doc: add phimoe as supported model

ggml-ci

---------

Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4474

b4468

b4467

b4466

b4465

b4464

b4458

b4457

b4456

b4453