Temp #26

apicalshark · 2024-12-01T06:26:13Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

* ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server : add speculative decoding support ggml-ci * server : add helper function slot.can_speculate() ggml-ci

* Add download chat feature to server chat Add a download feature next to the delete chat feature in the server vue chat interface. * code style --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

…#10497) * llama : accept a list of devices to use to offload a model * accept `--dev none` to completely disable offloading * fix dev list with dl backends * rename env parameter to LLAMA_ARG_DEVICE for consistency

ggml-ci

It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <ecurtin@redhat.com>

) The vulkan-shaders-gen was not parsing the --no-clean argument correctly. Because the previous code was parsing the arguments which have a value only and the --no-clean argument does not have a value, it was not being parsed correctly. This commit can now correctly parse arguments that don't have values.

Co-authored-by: noemotiovon <noemotiovon@gmail.com>

…ganov#10454) * improve inferencing performance for ascend npu. Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com> * some modification after review * some modifications after review * restore some modifications * restore some modifications --------- Co-authored-by: shanshan shen <shanshanshen333@gmail.com> Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com>

ggml-ci

* ggml-cpu: cmake add arm64 cpu feature check for macos * use vmmlaq_s32 for compile option i8mm check

…#10456)

…ov#10515)

* cmake : enable warnings in llama ggml-ci * cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS * cmake : get_flags -> ggml_get_flags * speculative-simple : fix warnings * cmake : reuse ggml_get_flags ggml-ci * speculative-simple : fix compile warning ggml-ci

…v#10507) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>

* server : replace behave with pytest * fix test on windows * misc * add more tests * more tests * styling * log less, fix embd test * added all sequential tests * fix coding style * fix save slot test * add parallel completion test * fix parallel test * remove feature files * update test docs * no cache_prompt for some tests * add test_cache_vs_nocache_prompt

Fix bad calculation of the end of the range. Add a backend test that covers the bad case (taken from stable diffusion). Fixes leejet/stable-diffusion.cpp#439.

…gerganov#10516) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Fix inconsistency of HIP flags in cmake & make * Fix docs regarding GGML_HIP

* Add link to OLMo 2 model in docs * Change link to landing page

ggml-ci

* cleanup UI link list * sort list alphabetically * add missing licenses

* imatrix-combine-only idea * ensured that behavior consistent with log

* server : add split model test * add test speculative * add invalid cases

* ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45) * force 16 sequential threads per block * make 16 subgroup size a constant

* readme : refresh * readme : move section [no ci] * readme : clarify [no ci] * readme : fixes [no ci] * readme : more fixes [no ci] * readme : simplify [no ci] * readme : clarify GGUF

…q4_0_4x4_q8_0() (ggerganov#10567) Signed-off-by: Adrien Gallouët <angt@huggingface.co>

ggerganov and others added 30 commits November 25, 2024 15:08

metal : minor code formatting

b756441

tests : fix compile warning

f6d12e7

ggml : add support for dynamic loading of backends (ggerganov#10469)

5931c1f

* ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

server : add speculative decoding support (ggerganov#10455)

9ca2e67

* server : add speculative decoding support ggml-ci * server : add helper function slot.can_speculate() ggml-ci

Add download chat feature to server chat (ggerganov#10481)

a9a678a

* Add download chat feature to server chat Add a download feature next to the delete chat feature in the server vue chat interface. * code style --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

Github: update issue templates [no ci] (ggerganov#10489)

1f92225

Rename Olmo1124 to Olmo2 (ggerganov#10500)

80acb7b

metal : enable mat-vec kernels for bs <= 4 (ggerganov#10491)

106964e

server : enable cache_prompt by default (ggerganov#10501)

47f931c

ggml-ci

server : add more information about error (ggerganov#10455)

9fd8c26

ci : build docker images only once daily (ggerganov#10503)

50d5cec

CANN: RoPE and CANCAT operator optimization (ggerganov#10488)

7066b4c

Co-authored-by: noemotiovon <noemotiovon@gmail.com>

speculative : simplify the implementation (ggerganov#10504)

811872a

ggml-ci

server : fix parallel speculative decoding (ggerganov#10513)

84e1c33

ggml-ci

ggml-cpu: cmake add arm64 cpu feature check for macos (ggerganov#10487)

25669aa

* ggml-cpu: cmake add arm64 cpu feature check for macos * use vmmlaq_s32 for compile option i8mm check

ci : add ubuntu cuda build, build with one arch on windows (ggerganov…

c6807b3

…#10456)

ci : publish the docker images created during scheduled runs (ggergan…

7db3846

…ov#10515)

restore the condistion to build & update pacakge when merge (ggergano…

0bbd226

…v#10507) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>

vulkan: fix group_norm (ggerganov#10496)

904109e

Fix bad calculation of the end of the range. Add a backend test that covers the bad case (taken from stable diffusion). Fixes leejet/stable-diffusion.cpp#439.

mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (g…

249cd93

…gerganov#10516) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

Fix HIP flag inconsistency & build docs (ggerganov#10524)

be0e350

* Fix inconsistency of HIP flags in cmake & make * Fix docs regarding GGML_HIP

llama : disable warnings for 3rd party sha1 dependency (ggerganov#10527)

30ec398

ci : remove nix workflows (ggerganov#10526)

5a349f2

Add OLMo 2 model in docs (ggerganov#10530)

de50973

* Add link to OLMo 2 model in docs * Change link to landing page

Alcpz and others added 12 commits November 29, 2024 20:38

sycl : offload of get_rows set to 0 (ggerganov#10432)

0f77aae

ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (ggerganov#10580)

4b3242b

ggml : fix I8MM Q4_1 scaling factor conversion (ggerganov#10562)

f0678c5

ggml-ci

cleanup UI link list (ggerganov#10577)

a3a3048

* cleanup UI link list * sort list alphabetically * add missing licenses

imatrix : support combine-only (ggerganov#10492)

3a8e9af

* imatrix-combine-only idea * ensured that behavior consistent with log

server : add more test cases (ggerganov#10569)

b782e5c

* server : add split model test * add test speculative * add invalid cases

ggml : move AMX to the CPU backend (ggerganov#10570)

7cc2d2c

* ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

readme : refresh (ggerganov#10587)

abadba0

* readme : refresh * readme : move section [no ci] * readme : clarify [no ci] * readme : fixes [no ci] * readme : more fixes [no ci] * readme : simplify [no ci] * readme : clarify GGUF

readme : remove old badge

3e0ba0e

ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_…

0c39f44

…q4_0_4x4_q8_0() (ggerganov#10567) Signed-off-by: Adrien Gallouët <angt@huggingface.co>

build: update Makefile comments for C++ version change (ggerganov#10598)

43957ef

github-actions bot added documentation Improvements or additions to documentation examples server build devops nix testing python script ggml SYCL Nvidia GPU Vulkan Kompute Apple Metal labels Dec 1, 2024

Merge branch 'master' into Temp

cf80952

apicalshark merged commit 0dbf2e2 into master Dec 1, 2024
5 of 8 checks passed

apicalshark deleted the Temp branch December 1, 2024 06:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temp #26

Temp #26

apicalshark commented Dec 1, 2024

Temp #26

Temp #26

Conversation

apicalshark commented Dec 1, 2024