Bug: logit_bias Persists Across Requests When cache_prompt Is Enabled in llama.cpp Server #9477
Labels
bug-unconfirmed
medium severity
Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
What happened?
When using the llama.cpp server with cache_prompt enabled, I've encountered an issue where the logit_bias specified in one request persists and influences subsequent requests, even when those requests do not include any logit_bias. This results in unexpected, biased outputs in later requests, where the model continues to favor tokens from a previous logit_bias setting.
Expected Behavior:
Steps to Reproduce:
Name and Version
./llama-cli --version
version: 3733 (1b28061)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.5.0
What operating system are you seeing the problem on?
No response
Relevant log output
No response
The text was updated successfully, but these errors were encountered: