Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 399d01f

Browse files
committed
Merge branch 'feat/use-llama-cpp-server' of github.com:janhq/cortex.llamacpp into feat/use-llama-cpp-server
2 parents ba7e5af + 7bbc7fe commit 399d01f

File tree

3 files changed

+3
-1
lines changed

3 files changed

+3
-1
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,3 +148,4 @@ Table of parameters
148148
|`flash_attn` | Boolean| To enable Flash Attention, default is true|
149149
|`cache_type` | String| KV cache type: f16, q8_0, q4_0, default is f16|
150150
|`use_mmap` | Boolean| To enable mmap, default is true|
151+
|`ctx_shift` | Boolean| To enable context shift, default is true|

llama.cpp

src/llama_engine.cc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -712,6 +712,7 @@ bool LlamaEngine::LoadModelImpl(std::shared_ptr<Json::Value> json_body) {
712712
}
713713
}
714714

715+
params.ctx_shift = json_body->get("ctx_shift", true).asBool();
715716
params.n_gpu_layers =
716717
json_body->get("ngl", 300)
717718
.asInt(); // change from 100 -> 300 since llama 3.1 has 292 gpu layers

0 commit comments

Comments
 (0)