v1/chat/completions endpoint does not honor `cache_prompt` #4329

Michael-F-Ellis · 2023-12-04T22:01:07Z

Expected Behavior

This is a follow-on to issue #4287. The README for server says that generation tags, e.g. mirostat_tau, are honored by the OAI compatible v1/chat/completions interface. That doesn't seem to be happening with either cache_prompt or slot_id.

Current Behavior

The cache is not used by by follow-on requests that include previous prompts and generated text. Instead, server re-evaluates the entire prompt. See the verbose log output appended below.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

Physical (or virtual) hardware you are using, e.g. for Linux:
Model Name: Mac mini
Model Identifier: Macmini9,1
Model Number: Z12P000KGLL/A
Chip: Apple M1
Total Number of Cores: 8 (4 performance and 4 efficiency)
Memory: 16 GB
System Firmware Version: 10151.41.12
OS Loader Version: 10151.41.12
Operating System, MacOS Sonoma

$ python3 --version
Python 3.10.9
$ make --version
GNU Make 3.81
$ g++ --version
Apple clang version 15.0.0 (clang-1500.0.40.1)

Failure Information (for bugs)

To reproduce:

Start the server with a suitable GGUF model (I've tested with zephyr-7B, and mistral-7B-open-orca)
e.g. ./server -m $MODELS/zephyr-7b-beta.Q4_K_M.gguf -c 8192 -t 8 --host 192.168.86.31 -v >> /tmp/llog2.txt
Send a request to the v1/chats/completions endpoint with "cache_prompt": true and "slot_id": 1 as JSON fields in the request.
Catenate the generated text with the first prompt. Add a little new text.
Send a new request with the catenated text as the prompt.
Observe that the server evaluates the entire prompt instead of only the new text.

Failure Logs

{"timestamp":1701723744,"level":"INFO","function":"main","line":2658,"message":"build info","build":1595,"commit":"880f579"}
{"timestamp":1701723744,"level":"INFO","function":"main","line":2665,"message":"system info","n_threads":8,"n_threads_batch":-1,"total_threads":8,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | "}
{"timestamp":1701723744,"level":"INFO","function":"main","line":3043,"message":"HTTP server listening","hostname":"192.168.86.31","port":8080}
{"timestamp":1701723787,"level":"VERBOSE","function":"update_slots","line":1783,"message":"prompt ingested","n_past":0,"cached":"","to_eval":" <|im_start|>user\n\"Continue the story that starts below:\\nOnce upon a time, \\n\"<|im_end|>\n<|im_start|>assistant\n"}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":262,"token_text":"in","has_next_token":true,"n_remain":-1,"num_tokens_predicted":0,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":264,"token_text":" a","has_next_token":true,"n_remain":-1,"num_tokens_predicted":1,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":2082,"token_text":" far","has_next_token":true,"n_remain":-1,"num_tokens_predicted":2,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":28733,"token_text":"-","has_next_token":true,"n_remain":47,"num_tokens_predicted":3,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":1769,"token_text":"off","has_next_token":true,"n_remain":46,"num_tokens_predicted":4,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
<--- SNIP --->
{"timestamp":1701723791,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":264,"token_text":" a","has_next_token":true,"n_remain":2,"num_tokens_predicted":48,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723791,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":20180,"token_text":" prince","has_next_token":true,"n_remain":1,"num_tokens_predicted":49,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723791,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":477,"token_text":" from","has_next_token":false,"n_remain":0,"num_tokens_predicted":50,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":""}
{"timestamp":1701723791,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"192.168.86.145","remote_port":49447,"status":200,"method":"POST","path":"/v1/chat/completions","params":{}}
{"timestamp":1701723791,"level":"VERBOSE","function":"log_server_request","line":2613,"message":"request","request":"{\"model\":\"url\",\"messages\":[{\"role\":\"user\",\"content\":\"\\\"Continue the story that starts below:\\\\nOnce upon a time, \\\\n\\\"\"}],\"temperature\":1.4,\"n\":1,\"max_tokens\":50,\"cache_prompt\":true,\"slot_id\":1}","response":"{\"__verbose\":{\"content\":\"in a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from\",\"generation_settings\":{\"frequency_penalty\":0.0,\"grammar\":\"\",\"ignore_eos\":false,\"logit_bias\":[],\"min_p\":0.05000000074505806,\"mirostat\":0,\"mirostat_eta\":0.0,\"mirostat_tau\":0.0,\"model\":\"/Users/mellis/Library/ApplicationSupport/faraday/models/zephyr-7b-beta.Q4_K_M.gguf\",\"n_ctx\":8192,\"n_keep\":0,\"n_predict\":50,\"n_probs\":0,\"penalize_nl\":false,\"presence_penalty\":0.0,\"repeat_last_n\":0,\"repeat_penalty\":1.100000023841858,\"seed\":0,\"stop\":[\"<|im_end|>\"],\"stream\":false,\"temp\":1.399999976158142,\"tfs_z\":0.0,\"top_k\":40,\"top_p\":0.949999988079071,\"typical_p\":0.0},\"model\":\"gpt-3.5-turbo-0613\",\"oaicompat_token_ctr\":50,\"prompt\":\"<|im_start|>user\\n\\\"Continue the story that starts below:\\\\nOnce upon a time, \\\\n\\\"<|im_end|>\\n<|im_start|>assistant\\n\",\"slot_id\":0,\"stop\":true,\"stopped_eos\":false,\"stopped_limit\":true,\"stopped_word\":false,\"stopping_word\":\"\",\"timings\":{\"predicted_ms\":3684.106,\"predicted_n\":50,\"predicted_per_second\":13.571813623169366,\"predicted_per_token_ms\":73.68212,\"prompt_ms\":817.203,\"prompt_n\":46,\"prompt_per_second\":56.28956330312053,\"prompt_per_token_ms\":17.765282608695653},\"tokens_cached\":96,\"tokens_evaluated\":46,\"tokens_predicted\":50,\"truncated\":false},\"choices\":[{\"finish_reason\":\"stop\",\"index\":0,\"message\":{\"content\":\"in a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from\",\"role\":\"assistant\"}}],\"created\":1701723791,\"id\":\"chatcmpl-Zl1BZGt8wJ2cJZqtGVhqmlD29AXRqMke\",\"model\":\"gpt-3.5-turbo-0613\",\"object\":\"chat.completion\",\"usage\":{\"completion_tokens\":50,\"prompt_tokens\":46,\"total_tokens\":96}}"}

<--- SECOND REQUEST --->
{"timestamp":1701723858,"level":"VERBOSE","function":"update_slots","line":1783,"message":"prompt ingested","n_past":0,"cached":"","to_eval":" <|im_start|>user\n\"Continue the story that starts below:\\nOnce upon a time, \\nin a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from a neighboring kingdom.\\n\\n\"<|im_end|>\n<|im_start|>assistant\n"}
{"timestamp":1701723860,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":2301,"token_text":"Is","has_next_token":true,"n_remain":0,"num_tokens_predicted":0,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723860,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":375,"token_text":"ab","has_next_token":true,"n_remain":0,"num_tokens_predicted":1,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723860,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":1994,"token_text":"ella","has_next_token":true,"n_remain":0,"num_tokens_predicted":2,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723860,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":403,"token_text":" was","has_next_token":true,"n_remain":47,"num_tokens_predicted":3,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
<--- SNIP --->
{"timestamp":1701723863,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":1885,"token_text":" conf","has_next_token":true,"n_remain":4,"num_tokens_predicted":46,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723863,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":1932,"token_text":"ided","has_next_token":true,"n_remain":3,"num_tokens_predicted":47,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723863,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":297,"token_text":" in","has_next_token":true,"n_remain":2,"num_tokens_predicted":48,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723863,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":559,"token_text":" her","has_next_token":true,"n_remain":1,"num_tokens_predicted":49,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723864,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":16424,"token_text":" closest","has_next_token":false,"n_remain":0,"num_tokens_predicted":50,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":""}
{"timestamp":1701723864,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"192.168.86.145","remote_port":49465,"status":200,"method":"POST","path":"/v1/chat/completions","params":{}}
{"timestamp":1701723864,"level":"VERBOSE","function":"log_server_request","line":2613,"message":"request","request":"{\"model\":\"url\",\"messages\":[{\"role\":\"user\",\"content\":\"\\\"Continue the story that starts below:\\\\nOnce upon a time, \\\\nin a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from a neighboring kingdom.\\\\n\\\\n\\\"\"}],\"temperature\":1.4,\"n\":1,\"max_tokens\":50,\"cache_prompt\":true,\"slot_id\":1}","response":"{\"__verbose\":{\"content\":\"Isabella was not opposed to the idea of marriage, but she had her heart set on a different suitor. His name was Edward, a commoner who had captured her heart with his kindness and bravery. She had confided in her closest\",\"generation_settings\":{\"frequency_penalty\":0.0,\"grammar\":\"\",\"ignore_eos\":false,\"logit_bias\":[],\"min_p\":0.05000000074505806,\"mirostat\":0,\"mirostat_eta\":0.0,\"mirostat_tau\":0.0,\"model\":\"/Users/mellis/Library/ApplicationSupport/faraday/models/zephyr-7b-beta.Q4_K_M.gguf\",\"n_ctx\":8192,\"n_keep\":0,\"n_predict\":50,\"n_probs\":0,\"penalize_nl\":false,\"presence_penalty\":0.0,\"repeat_last_n\":0,\"repeat_penalty\":1.100000023841858,\"seed\":0,\"stop\":[\"<|im_end|>\"],\"stream\":false,\"temp\":1.399999976158142,\"tfs_z\":0.0,\"top_k\":40,\"top_p\":0.949999988079071,\"typical_p\":0.0},\"model\":\"gpt-3.5-turbo-0613\",\"oaicompat_token_ctr\":50,\"prompt\":\"<|im_start|>user\\n\\\"Continue the story that starts below:\\\\nOnce upon a time, \\\\nin a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from a neighboring kingdom.\\\\n\\\\n\\\"<|im_end|>\\n<|im_start|>assistant\\n\",\"slot_id\":0,\"stop\":true,\"stopped_eos\":false,\"stopped_limit\":true,\"stopped_word\":false,\"stopping_word\":\"\",\"timings\":{\"predicted_ms\":3714.829,\"predicted_n\":50,\"predicted_per_second\":13.459569740626016,\"predicted_per_token_ms\":74.29658,\"prompt_ms\":1406.272,\"prompt_n\":105,\"prompt_per_second\":74.66549856642243,\"prompt_per_token_ms\":13.393066666666666},\"tokens_cached\":155,\"tokens_evaluated\":105,\"tokens_predicted\":50,\"truncated\":false},\"choices\":[{\"finish_reason\":\"stop\",\"index\":0,\"message\":{\"content\":\"Isabella was not opposed to the idea of marriage, but she had her heart set on a different suitor. His name was Edward, a commoner who had captured her heart with his kindness and bravery. She had confided in her closest\",\"role\":\"assistant\"}}],\"created\":1701723864,\"id\":\"chatcmpl-UgvXb8MPIqGdEiAbnIJR7rfFlm8XX09W\",\"model\":\"gpt-3.5-turbo-0613\",\"object\":\"chat.completion\",\"usage\":{\"completion_tokens\":50,\"prompt_tokens\":105,\"total_tokens\":155}}"}

The text was updated successfully, but these errors were encountered:

ggerganov · 2023-12-06T08:30:46Z

Can you verify that the issue is resolved with: #4347

Michael-F-Ellis · 2023-12-06T15:41:29Z

Thanks for the response! Unfortunately, it's still not honoring the cache. I pulled, did a make clean and a make. Here are server log outputs for two requests. The first began with 707 tokens and a request to generate 50 more. It took 7 seconds for prompt eval and 4 seconds for prediction. The second request sent the catenation of the previous prompt and the generated text with no alterations. It took 7.4 seconds for prompt eval and 4 for prediction.

print_timings: prompt eval time =    7039.53 ms /   707 tokens (    9.96 ms per token,   100.43 tokens per second)
print_timings:        eval time =    3989.39 ms /    50 runs   (   79.79 ms per token,    12.53 tokens per second)
print_timings:       total time =   11028.91 ms
slot 0 released (758 tokens in cache)
{"timestamp":1701875948,"level":"INFO","function":"log_server_request","line":2606,"message":"request","remote_addr":"192.168.86.145","remote_port":61732,"status":200,"method":"POST","path":"/v1/chat/completions","params":{}}
slot 0 is processing [task id: 4]
slot 0 : kv cache rm - [0, end)

print_timings: prompt eval time =    7367.88 ms /   763 tokens (    9.66 ms per token,   103.56 tokens per second)
print_timings:        eval time =    4001.73 ms /    50 runs   (   80.03 ms per token,    12.49 tokens per second)
print_timings:       total time =   11369.60 ms
slot 0 released (814 tokens in cache)
{"timestamp":1701875980,"level":"INFO","function":"log_server_request","line":2606,"message":"request","remote_addr":"192.168.86.145","remote_port":61738,"status":200,"method":"POST","path":"/v1/chat/completions","params":{}}

The server command line was:

./server -m $MODELS/zephyr-7b-beta.Q4_K_M.gguf -c 8192 -t 8 --host 192.168.86.31

and the first part of the startup log is:

{"timestamp":1701875634,"level":"INFO","function":"main","line":2656,"message":"build info","build":1616,"commit":"caa9249"}
{"timestamp":1701875634,"level":"INFO","function":"main","line":2663,"message":"system info","n_threads":8,"n_threads_batch":-1,"total_threads":8,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | "}
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /Users/mellis/Library/ApplicationSupport/faraday/models/zephyr-7b-beta.Q4_K_M.gguf (version GGUF V3 (latest))

I've verified that the JSON I'm submitting contains "cache_prompt": true at the same level as temperature, n, and max_tokens. I also checked that your changes from #4347 are present in server.cpp.

ggerganov · 2023-12-06T16:48:59Z

I'm testing with the following curl command and it seems to work:

curl -s http://127.0.0.1:8080/v1/chat/completions     -H "Content-Type: application/json"     -H "Authorization: Bearer no-key"     -d '{
        "model": "gpt-3.5-turbo", "cache_prompt": true,
        "messages": [
            {
                "role": "system",
                "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."
            },
            {
                "role": "user",
                "content": "Write a limerick about python exceptions"
            }
        ]
    }' | jq

If possible, please provide curl commands that fail to utilize the prompt cache so it is easier for me to find the issue

Michael-F-Ellis · 2023-12-06T17:11:53Z

I ran your curl command twice in succession and got the following log output:

print_timings: prompt eval time =    1123.17 ms /    77 tokens (   14.59 ms per token,    68.56 tokens per second)
print_timings:        eval time =    3354.28 ms /    46 runs   (   72.92 ms per token,    13.71 tokens per second)
print_timings:       total time =    4477.45 ms
slot 0 released (124 tokens in cache)
{"timestamp":1701881683,"level":"INFO","function":"log_server_request","line":2606,"message":"request","remote_addr":"192.168.86.145","remote_port":62746,"status":200,"method":"POST","path":"/v1/chat/completions","params":{}}
slot 0 is processing [task id: 6]
slot 0 : kv cache rm - [0, end)

print_timings: prompt eval time =    1077.96 ms /    77 tokens (   14.00 ms per token,    71.43 tokens per second)
print_timings:        eval time =    3365.27 ms /    46 runs   (   73.16 ms per token,    13.67 tokens per second)
print_timings:       total time =    4443.23 ms
slot 0 released (124 tokens in cache)
{"timestamp":1701881848,"level":"INFO","function":"log_server_request","line":2606,"message":"request","remote_addr":"192.168.86.145","remote_port":62769,"status":200,"method":"POST","path":"/v1/chat/completions","params":{}}

The only difference between my command and yours was that I changed the host IP from 127.0.0.1 to 192.168.86.31. (I'm running the server on my M1 mac mini and sending requests from my laptop.) I then restarted the server to serve on 127.0.0.1 and ran the command from a terminal window on the mac mini. Same outcome.

What other tests or output can I run to help sort this out?

ggerganov · 2023-12-06T17:20:24Z

Just noticed this in your log:

{"timestamp":1701875634,"level":"INFO","function":"main","line":2656,"message":"build info","build":1616,"commit":"caa9249"}

It means you are still on master: caa9249

So your server does not have the change that I implemented in #4347

The commit you should be testing on is: ef455cb

Michael-F-Ellis · 2023-12-06T18:06:26Z

That did it! Sorry for not noticing your fix was on a different branch. See below the log output when I send a sizeable prompt via my ficta application. The first request has to process all 809 prompt tokens (8 seconds) but the second evaluates only 73 tokens (1.1 seconds). Huge improvement. Thanks ever so much.

Just so I'm clear on how caching works,

Am I correct that each pass caches only the prompt and not the generated text from that pass?
If I edit some part of the prior prompt before re-sending, will the cache still be applied to everything before the edit?

llama server listening at http://192.168.86.31:8080

{"timestamp":1701884746,"level":"INFO","function":"main","line":3042,"message":"HTTP server listening","hostname":"192.168.86.31","port":8080}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 is processing [task id: 0]
slot 0 : in cache: 0 tokens | to process: 809 tokens
slot 0 : kv cache rm - [0, end)

print_timings: prompt eval time =    7938.18 ms /   809 tokens (    9.81 ms per token,   101.91 tokens per second)
print_timings:        eval time =    4027.66 ms /    50 runs   (   80.55 ms per token,    12.41 tokens per second)
print_timings:       total time =   11965.85 ms
slot 0 released (860 tokens in cache)
{"timestamp":1701884846,"level":"INFO","function":"log_server_request","line":2607,"message":"request","remote_addr":"192.168.86.145","remote_port":63396,"status":200,"method":"POST","path":"/v1/chat/completions","params":{}}
slot 0 is processing [task id: 1]
slot 0 : in cache: 790 tokens | to process: 73 tokens
slot 0 : kv cache rm - [790, end)

print_timings: prompt eval time =    1118.32 ms /    73 tokens (   15.32 ms per token,    65.28 tokens per second)
print_timings:        eval time =    4055.14 ms /    50 runs   (   81.10 ms per token,    12.33 tokens per second)
print_timings:       total time =    5173.46 ms
slot 0 released (914 tokens in cache)
{"timestamp":1701884911,"level":"INFO","function":"log_server_request","line":2607,"message":"request","remote_addr":"192.168.86.145","remote_port":63412,"status":200,"method":"POST","path":"/v1/chat/completions","params":{}}

ggerganov · 2023-12-06T18:21:48Z

It also caches the generated text
Yes

Michael-F-Ellis added the bug-unconfirmed label Dec 4, 2023

ggerganov mentioned this issue Dec 6, 2023

server : recognize cache_prompt parameter in OAI API #4347

Merged

ggerganov closed this as completed in #4347 Dec 6, 2023

K0IN mentioned this issue Dec 20, 2023

Enable prompt cache ollama/ollama#1573

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1/chat/completions endpoint does not honor `cache_prompt` #4329

v1/chat/completions endpoint does not honor `cache_prompt` #4329

Michael-F-Ellis commented Dec 4, 2023 •

edited

Loading

ggerganov commented Dec 6, 2023

Michael-F-Ellis commented Dec 6, 2023 •

edited

Loading

ggerganov commented Dec 6, 2023

Michael-F-Ellis commented Dec 6, 2023

ggerganov commented Dec 6, 2023

Michael-F-Ellis commented Dec 6, 2023

ggerganov commented Dec 6, 2023

v1/chat/completions endpoint does not honor cache_prompt #4329

v1/chat/completions endpoint does not honor cache_prompt #4329

Comments

Michael-F-Ellis commented Dec 4, 2023 • edited Loading

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Failure Logs

ggerganov commented Dec 6, 2023

Michael-F-Ellis commented Dec 6, 2023 • edited Loading

ggerganov commented Dec 6, 2023

Michael-F-Ellis commented Dec 6, 2023

ggerganov commented Dec 6, 2023

Michael-F-Ellis commented Dec 6, 2023

ggerganov commented Dec 6, 2023

v1/chat/completions endpoint does not honor `cache_prompt` #4329

v1/chat/completions endpoint does not honor `cache_prompt` #4329

Michael-F-Ellis commented Dec 4, 2023 •

edited

Loading

Michael-F-Ellis commented Dec 6, 2023 •

edited

Loading