Releases · arthw/llama.cpp

07 Aug 16:34

75a3266

b3555 Latest

Latest

fix error

Assets 20

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-08-07T16:34:20Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-08-07T16:34:26Z
llama-b3555-bin-macos-arm64.zip

45.6 MB 2024-08-07T16:34:33Z
llama-b3555-bin-macos-x64.zip

47 MB 2024-08-07T16:34:35Z
llama-b3555-bin-ubuntu-x64.zip

50.8 MB 2024-08-07T16:34:36Z
llama-b3555-bin-win-avx-x64.zip

7.26 MB 2024-08-07T16:34:38Z
llama-b3555-bin-win-avx2-x64.zip

7.25 MB 2024-08-07T16:34:38Z
llama-b3555-bin-win-avx512-x64.zip

7.26 MB 2024-08-07T16:34:39Z
llama-b3555-bin-win-cuda-cu11.7.1-x64.zip

124 MB 2024-08-07T16:34:40Z
llama-b3555-bin-win-cuda-cu12.2.0-x64.zip

123 MB 2024-08-07T16:34:43Z
Source code (zip)

2024-08-07T15:22:50Z
Source code (tar.gz)

2024-08-07T15:22:50Z

07 Aug 16:26

github-actions

b3554

9d73802

b3554

ggml-backend : fix async copy from CPU (#8897)

* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same

Assets 19

02 Aug 05:48

github-actions

b3517

11c713b

b3517

[SYCL] Fixing wrong VDR iq4nl value (#8812)

Assets 20

01 Aug 06:57

github-actions

b3482

c16f01b

b3482

Merge pull request #2 from arthw/refactor_dev

Refactor device management and usage api

Assets 20

27 Jul 14:49

github-actions

b3475

e661170

b3475

llama : add support for llama 3.1 rope scaling factors (#8676)

* Add llama 3.1 rope scaling factors to llama conversion and inference

This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* address comments

* address comments

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>

Assets 20

14 Jul 04:09

github-actions

b3388

a364ec7

b3388

fix UT of concat

Assets 20

13 Jul 18:00

github-actions

b3387

e700d37

b3387

mv softmax to separated file

Assets 20

13 Jul 17:38

github-actions

b3313

a4c8edc

b3313

fix for multiple cards

Assets 20

13 Jul 09:34

github-actions

b3312

aeaed61

b3312

Merge pull request #1 from arthw/update_warp

[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) cherry-pick b549a1bbefb2f1fbb8b558bac1f2ae7967e60964

Assets 20

07 Jul 14:23

github-actions

b3309

c5009e6

b3309

py : switch to snake_case (#8305)

* py : switch to snake_case

ggml-ci

* cont

ggml-ci

* cont

ggml-ci

* cont : fix link

* gguf-py : use snake_case in scripts entrypoint export

* py : rename requirements for convert_legacy_llama.py

Needed for scripts/check-requirements.sh

---------

Co-authored-by: Francis Couture-Harpin <git@compilade.net>

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: arthw/llama.cpp

b3555

b3554

b3517

b3482

b3475

b3388

b3387

b3313

b3312

b3309