Adding -tb, --threads-batch to server.cpp #3584

m18coppola · 2023-10-11T16:34:04Z

resolves #3473

…example * 'master' of github.com:ggerganov/llama.cpp: (34 commits) examples: support LLaVA v1.5 (multimodal model) (ggerganov#3436) docs : fix typo GOMP_CPU_AFFINITY (ggerganov#3597) cmake : fix add_compile_options on macOS typo : it is `--n-gpu-layers` not `--gpu-layers` (ggerganov#3592) ci : check if there is enough VRAM (ggerganov#3596) server : add completion mode (no chat) (ggerganov#3582) prompts : add mnemonics.txt server : fix kv cache management (ggerganov#3588) main : fix session loading bug (ggerganov#3400) server : add parameter -tb N, --threads-batch N (ggerganov#3584) common : fix mirostat state when using multiple sequences (ggerganov#3543) batched : add bench tool (ggerganov#3545) examples : add batched.swift + improve CI for swift (ggerganov#3562) Add MPT model to supported models in README.md (ggerganov#3574) Minor improvements in GPT2 tokenizer (ggerganov#3567) readme : add bloom (ggerganov#3570) llm : add bloom models (ggerganov#3553) swift : improvements and fixes (ggerganov#3564) llm : add MPT support (ggerganov#3417) infill. : fix tokenization (ggerganov#3508) ...

Co-authored-by: Michael Coppola <info@michaeljcoppola.com>

Co-authored-by: Michael Coppola <m18coppola@gmail.com> Co-authored-by: Michael Coppola <info@michaeljcoppola.com>

* master: (350 commits) speculative : ensure draft and target model vocab matches (ggerganov#3812) llama : correctly report GGUFv3 format (ggerganov#3818) simple : fix batch handling (ggerganov#3803) cuda : improve text-generation and batched decoding performance (ggerganov#3776) server : do not release slot on image input (ggerganov#3798) batched-bench : print params at start log : disable pid in log filenames server : add parameter -tb N, --threads-batch N (ggerganov#3584) (ggerganov#3768) server : do not block system prompt update (ggerganov#3767) sync : ggml (conv ops + cuda MSVC fixes) (ggerganov#3765) cmake : add missed dependencies (ggerganov#3763) cuda : add batched cuBLAS GEMM for faster attention (ggerganov#3749) Add more tokenizer tests (ggerganov#3742) metal : handle ggml_scale for n%4 != 0 (close ggerganov#3754) Revert "make : add optional CUDA_NATIVE_ARCH (ggerganov#2482)" issues : separate bug and enhancement template + no default title (ggerganov#3748) Update special token handling in conversion scripts for gpt2 derived tokenizers (ggerganov#3746) llama : remove token functions with `context` args in favor of `model` (ggerganov#3720) Fix baichuan convert script not detecing model (ggerganov#3739) make : add optional CUDA_NATIVE_ARCH (ggerganov#2482) ...

Michael Coppola and others added 2 commits October 4, 2023 12:04

server.cpp now accepts parameter -tb N, --threads-batch N

1f31478

Merge branch 'ggerganov:master' into master

0fd0f28

cebtenzzre approved these changes Oct 11, 2023

View reviewed changes

ggerganov merged commit a8bdd65 into ggerganov:master Oct 11, 2023
28 of 38 checks passed

cebtenzzre pushed a commit to cebtenzzre/llama.cpp that referenced this pull request Oct 24, 2023

server : add parameter -tb N, --threads-batch N (ggerganov#3584)

54f9831

Co-authored-by: Michael Coppola <info@michaeljcoppola.com>

cebtenzzre mentioned this pull request Oct 24, 2023

server : re-add parameter -tb N, --threads-batch N #3768

Merged

ggerganov pushed a commit that referenced this pull request Oct 24, 2023

server : add parameter -tb N, --threads-batch N (#3584) (#3768)

ad93962

Co-authored-by: Michael Coppola <m18coppola@gmail.com> Co-authored-by: Michael Coppola <info@michaeljcoppola.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding -tb, --threads-batch to server.cpp #3584

Adding -tb, --threads-batch to server.cpp #3584

m18coppola commented Oct 11, 2023

Adding -tb, --threads-batch to server.cpp #3584

Adding -tb, --threads-batch to server.cpp #3584

Conversation

m18coppola commented Oct 11, 2023