Closed
Description
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
version: 0 (unknown)
built with gcc (GCC) 13.3.0 for x86_64-unknown-linux-gnu
(actually version 4552, built with Nix)
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
$ bin/llama-server -m ../../../wizardcoder-python-34b-v1.0.Q5_K_M.gguf -ngl 9999
...
$ curl -fsS \
--url http://127.0.0.1:8080/completion \
--header "Content-Type: application/json" \
--data '{"prompt": "Hello","n_predict": 1, "n_probs": 10, "temperature":0}' | jq .
{
...
"completion_probabilities": [
{
"id": 2897,
"token": " os", <---------- whitespace OK
"bytes": [
32, <---------- whitespace OK
111,
115
],
"logprob": -2.0750603675842285,
"top_logprobs": [
{
"id": 2897,
"token": "os", <---------- whitespace missing
"bytes": [
111, <---------- whitespace missing
115
],
"logprob": -2.0750603675842285
},
Problem description & steps to reproduce
As above, doesn't seem to depend on the model