Closed
Description
Version: b2794.
Model: Meta-Llama-3-8B-Instruct-Q8_0.gguf (updated)
Prompt: "<|start_header_id|>user<|end_header_id|>How much is 12 plus 19?<|eot_id|>"
When I run the server and send a completion request with streaming, in the verbose logs I see that the server generates the "<|start_header_id|>", "assistant" and "<|end_header_id|>", followed by "\n\n12 + 19 = 31".
However, the streaming chunks sent by server for <|start_header_id|> and <|end_header_id|> have empty strings as content
in data
.
I couldn't find a config parameter either in the server or in the request that could change this behavior.