Commit eaf466d
committed
Fix truncated logprobs when streaming is off
The logic to skip the logprobs of the stop token was originally from
ggml-org/llama.cpp#2849, and was later modified as part of
ggml-org/llama.cpp#10643 to be applied only to STOP_TYPE_WORD.
The latter change wasn't included in ikawrakow#723. Then, after ikawrakow#958 got merged,
the logic got inadvertently applied to GLM-4.5/4.6 and Kimi K2,
resulting in truncated logprobs when streaming is off.
This commit reverts the logic from ggml-org/llama.cpp#2849, such that
the logprobs of the stop token will always be included in the response,
when logprobs is enabled. From testing, this matches with the behavior
of Fireworks inference server, for both chat completions and text
completions endpoints.
Also fix logprobs param handling for the text completion endpoint.1 parent 912c98f commit eaf466d
2 files changed
+9
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2646 | 2646 | | |
2647 | 2647 | | |
2648 | 2648 | | |
2649 | | - | |
2650 | | - | |
2651 | | - | |
2652 | | - | |
2653 | | - | |
2654 | | - | |
2655 | | - | |
2656 | | - | |
2657 | | - | |
2658 | | - | |
2659 | | - | |
2660 | | - | |
| 2649 | + | |
| 2650 | + | |
| 2651 | + | |
2661 | 2652 | | |
2662 | 2653 | | |
2663 | 2654 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
466 | 466 | | |
467 | 467 | | |
468 | 468 | | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
469 | 475 | | |
470 | 476 | | |
471 | 477 | | |
| |||
0 commit comments