-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: --n-predict option document and ensure the completion request does not exceed it #5549
server: --n-predict option document and ensure the completion request does not exceed it #5549
Conversation
b45e111
to
d112457
Compare
@@ -545,6 +547,15 @@ struct llama_server_context | |||
slot->sparams.grammar = json_value(data, "grammar", default_sparams.grammar); | |||
slot->sparams.n_probs = json_value(data, "n_probs", default_sparams.n_probs); | |||
|
|||
if (slot->n_predict > 0 && slot->params.n_predict > slot->n_predict) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov Should this check be moved in llama_client_slot::has_budget
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this.
I think the proposed change is good enough, instead of throwing an error. Though there can be arguments either way.
…5549) * server: document --n-predict * server: ensure client request cannot override n_predict if set * server: fix print usage LF in new --n-predict option
…5549) * server: document --n-predict * server: ensure client request cannot override n_predict if set * server: fix print usage LF in new --n-predict option
Context
server
--n-predict
option is supported but not documented. When endpointscompletion
or the oai compatible one are called withn_predict
ormax_tokens
, global configuration is not checked and completion tokens can exceed--n-predict
server option.It may help people to ensure request will never infinite loop for example in #3969.
Proposed changes
--n-predict
option inREADME.md
and in the server print usage.n_predict
param by the server paramsn_predict
or the user input data.Open question
400
when--n-predict
> 0 &&data['n_predict']
>--n-predict
or it can be done later on