Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temp #26

Merged
merged 72 commits into from
Dec 1, 2024
Merged

Temp #26

merged 72 commits into from
Dec 1, 2024

Conversation

apicalshark
Copy link
Owner

ggerganov and others added 30 commits November 25, 2024 15:08
* ggml : add support for dynamic loading of backends

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : add speculative decoding support

ggml-ci

* server : add helper function slot.can_speculate()

ggml-ci
* Add download chat feature to server chat

Add a download feature next to the delete chat feature in the server vue chat interface.

* code style

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
…#10497)

* llama : accept a list of devices to use to offload a model

* accept `--dev none` to completely disable offloading

* fix dev list with dl backends

* rename env parameter to LLAMA_ARG_DEVICE for consistency
It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
)

The vulkan-shaders-gen was not parsing the --no-clean argument correctly.
Because the previous code was parsing the arguments which have a value only
and the --no-clean argument does not have a value, it was not being parsed
correctly. This commit can now correctly parse arguments that don't have values.
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
…ganov#10454)

* improve inferencing performance for ascend npu.

Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com>

* some modification after review

* some modifications after review

* restore some modifications

* restore some modifications

---------

Co-authored-by: shanshan shen <shanshanshen333@gmail.com>
Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com>
* ggml-cpu: cmake add arm64 cpu feature check for macos

* use vmmlaq_s32 for compile option i8mm check
* cmake : enable warnings in llama

ggml-ci

* cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS

* cmake : get_flags -> ggml_get_flags

* speculative-simple : fix warnings

* cmake : reuse ggml_get_flags

ggml-ci

* speculative-simple : fix compile warning

ggml-ci
…v#10507)

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
* server : replace behave with pytest

* fix test on windows

* misc

* add more tests

* more tests

* styling

* log less, fix embd test

* added all sequential tests

* fix coding style

* fix save slot test

* add parallel completion test

* fix parallel test

* remove feature files

* update test docs

* no cache_prompt for some tests

* add test_cache_vs_nocache_prompt
Fix bad calculation of the end of the range. Add a backend test that
covers the bad case (taken from stable diffusion).

Fixes leejet/stable-diffusion.cpp#439.
* Fix inconsistency of HIP flags in cmake & make

* Fix docs regarding GGML_HIP
* Add link to OLMo 2 model in docs

* Change link to landing page
Alcpz and others added 12 commits November 29, 2024 20:38
* cleanup UI link list

* sort list alphabetically

* add missing licenses
* imatrix-combine-only idea

* ensured that behavior consistent with log
* server : add split model test

* add test speculative

* add invalid cases
* ggml : move AMX to the CPU backend

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* subgroup 64 version with subgroup add. 15% faster

scalable version

tested for subgroup sizes 16-128

* check for subgroup multiple of 16 and greater than 16

* subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45)

* force 16 sequential threads per block

* make 16 subgroup size a constant
* readme : refresh

* readme : move section [no ci]

* readme : clarify [no ci]

* readme : fixes [no ci]

* readme : more fixes [no ci]

* readme : simplify [no ci]

* readme : clarify GGUF
…q4_0_4x4_q8_0() (ggerganov#10567)

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
@apicalshark apicalshark merged commit 0dbf2e2 into master Dec 1, 2024
5 of 8 checks passed
@apicalshark apicalshark deleted the Temp branch December 1, 2024 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.