forked from ggerganov/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temp #26
Merged
Merged
Temp #26
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Owner
apicalshark
commented
Dec 1, 2024
- I have read the contributing guidelines
- Self-reported review complexity:
- Low
- Medium
- High
* ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : add speculative decoding support ggml-ci * server : add helper function slot.can_speculate() ggml-ci
* Add download chat feature to server chat Add a download feature next to the delete chat feature in the server vue chat interface. * code style --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
…#10497) * llama : accept a list of devices to use to offload a model * accept `--dev none` to completely disable offloading * fix dev list with dl backends * rename env parameter to LLAMA_ARG_DEVICE for consistency
It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <ecurtin@redhat.com>
) The vulkan-shaders-gen was not parsing the --no-clean argument correctly. Because the previous code was parsing the arguments which have a value only and the --no-clean argument does not have a value, it was not being parsed correctly. This commit can now correctly parse arguments that don't have values.
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
…ganov#10454) * improve inferencing performance for ascend npu. Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com> * some modification after review * some modifications after review * restore some modifications * restore some modifications --------- Co-authored-by: shanshan shen <shanshanshen333@gmail.com> Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com>
* ggml-cpu: cmake add arm64 cpu feature check for macos * use vmmlaq_s32 for compile option i8mm check
* cmake : enable warnings in llama ggml-ci * cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS * cmake : get_flags -> ggml_get_flags * speculative-simple : fix warnings * cmake : reuse ggml_get_flags ggml-ci * speculative-simple : fix compile warning ggml-ci
…v#10507) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
* server : replace behave with pytest * fix test on windows * misc * add more tests * more tests * styling * log less, fix embd test * added all sequential tests * fix coding style * fix save slot test * add parallel completion test * fix parallel test * remove feature files * update test docs * no cache_prompt for some tests * add test_cache_vs_nocache_prompt
Fix bad calculation of the end of the range. Add a backend test that covers the bad case (taken from stable diffusion). Fixes leejet/stable-diffusion.cpp#439.
…gerganov#10516) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Fix inconsistency of HIP flags in cmake & make * Fix docs regarding GGML_HIP
* Add link to OLMo 2 model in docs * Change link to landing page
* cleanup UI link list * sort list alphabetically * add missing licenses
* imatrix-combine-only idea * ensured that behavior consistent with log
* server : add split model test * add test speculative * add invalid cases
* ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45) * force 16 sequential threads per block * make 16 subgroup size a constant
* readme : refresh * readme : move section [no ci] * readme : clarify [no ci] * readme : fixes [no ci] * readme : more fixes [no ci] * readme : simplify [no ci] * readme : clarify GGUF
…q4_0_4x4_q8_0() (ggerganov#10567) Signed-off-by: Adrien Gallouët <angt@huggingface.co>
github-actions
bot
added
documentation
Improvements or additions to documentation
examples
server
build
devops
nix
testing
python
script
ggml
SYCL
Nvidia GPU
Vulkan
Kompute
Apple Metal
labels
Dec 1, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Apple Metal
build
devops
documentation
Improvements or additions to documentation
examples
ggml
Kompute
nix
Nvidia GPU
python
script
server
SYCL
testing
Vulkan
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.