Closed
Description
Expected Behavior
./simple.cpp
with TheBloke's Llama-2-7b-Chat-GGUF
should run without issue.
Current Behavior
./simple ~/.cache/huggingface/hub/models--TheBloke--Llama-2-7b-Chat-GGUF/blobs/08a5566d61d7cb6b420c3e4387a39e0078e1f2fe5f055f3a03887385304d4bfa
(https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF)
results in
Hello my name isSegmentation fault (core dumped)
The model works fine with main
.
I'm running ubuntu latest with everything up to date. compiled with make
(no cuda, etc.).
The line that fails is
llama.cpp: 1453 (
llama_kv_cache_find_slot
)cache.cells[cache.head + i].seq_id.insert(batch.seq_id[i][j]);
The initilization of llama_batch::seq_id
in simple.cpp
seems suspect - but I'm not nearly knowlegeable about what seq_id
should be to fix it.
llama_batch batch = llama_batch_init(512, 0, 1);
// evaluate the initial prompt
batch.n_tokens = tokens_list.size();
for (int32_t i = 0; i < batch.n_tokens; i++) {
batch.token[i] = tokens_list[i];
batch.pos[i] = i;
batch.seq_id[i] = 0;
batch.logits[i] = false;
}
// llama_decode will output logits only for the last token of the prompt
batch.logits[batch.n_tokens - 1] = true;
Time permitting I may take a stab at porting whatever seems to be working for main
over.