Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: --n-predict option document and ensure the completion request does not exceed it #5549

Merged
merged 3 commits into from
Feb 18, 2024

Conversation

phymbert
Copy link
Collaborator

Context
server --n-predict option is supported but not documented. When endpoints completion or the oai compatible one are called with n_predict or max_tokens , global configuration is not checked and completion tokens can exceed --n-predict server option.

It may help people to ensure request will never infinite loop for example in #3969.

Proposed changes

  1. Document the --n-predict option in README.md and in the server print usage.
  2. Min the slot n_predict param by the server params n_predict or the user input data.

Open question

  • @ggerganov Should the server reject the completion requests with 400 when --n-predict > 0 && data['n_predict'] > --n-predict or it can be done later on

@phymbert phymbert changed the title server: --n_predict option document and ensure the completion request does not exceed it server: --n-predict option document and ensure the completion request does not exceed it Feb 17, 2024
@phymbert phymbert force-pushed the feature/server-global-n-predict branch from b45e111 to d112457 Compare February 17, 2024 13:56
@@ -545,6 +547,15 @@ struct llama_server_context
slot->sparams.grammar = json_value(data, "grammar", default_sparams.grammar);
slot->sparams.n_probs = json_value(data, "n_probs", default_sparams.n_probs);

if (slot->n_predict > 0 && slot->params.n_predict > slot->n_predict) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov Should this check be moved in llama_client_slot::has_budget

@phymbert phymbert mentioned this pull request Feb 18, 2024
4 tasks
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this.

I think the proposed change is good enough, instead of throwing an error. Though there can be arguments either way.

@ggerganov ggerganov merged commit 36376ab into ggerganov:master Feb 18, 2024
47 of 53 checks passed
@phymbert phymbert deleted the feature/server-global-n-predict branch February 23, 2024 21:52
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
…5549)

* server: document --n-predict

* server: ensure client request cannot override n_predict if set

* server: fix print usage LF in new --n-predict option
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
…5549)

* server: document --n-predict

* server: ensure client request cannot override n_predict if set

* server: fix print usage LF in new --n-predict option
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants