You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I reviewed the Discussions, and have a new bug or useful enhancement to share.
Feature Description
The same set of parameters should be available when calling from either completion or v1/chat/completions endpoints. Most notably min_p and grammar are useful to have.
A call like this should be possible for example:
curl http://localhost:3077/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{"temperature": 1.0,"min_p": 0.01,"top_k": 0,"top_p": 1,"repeat_penalty": 1,"grammar": "root ::= (\"Hello!\" | \"Hi!\")","messages": [{ "role": "system", "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."},{ "role": "user", "content": "Hi"}]}'
Motivation
To be able to fully make use the llama.cpp backend, when replacing another LLM call that uses openai sdk for example, its useful to have access to the full set of parameters to tune the output for the task. It's possible to add those parameters as a dictionary using the extra_body input parameter when making a call using the python openai library.
If the parameters aren't available when making the switch, the dev will have to consider changing the code to use the completion endpoint instead, or even have separate versions of the same code to be able to compare different LLMs.
Possible Implementation
I'm guessing oaicompat_completion_params_parse function in examples/server/server.cpp can be used to add more parameters.
The text was updated successfully, but these errors were encountered:
I think grammar can be especially useful to force the model's answer to begin in a certain way. This can guide the model's answer to what to what the user desires.
Alternatively, if there is some reason for not providing grammar for v1/chat/completions, I think a new optional parameter start containing the beginning of the completion would be helpful (although that'd be a new issue).
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Feature Description
The same set of parameters should be available when calling from either
completion
orv1/chat/completions
endpoints. Most notablymin_p
andgrammar
are useful to have.A call like this should be possible for example:
Motivation
To be able to fully make use the llama.cpp backend, when replacing another LLM call that uses openai sdk for example, its useful to have access to the full set of parameters to tune the output for the task. It's possible to add those parameters as a dictionary using the
extra_body
input parameter when making a call using the python openai library.If the parameters aren't available when making the switch, the dev will have to consider changing the code to use the
completion
endpoint instead, or even have separate versions of the same code to be able to compare different LLMs.Possible Implementation
I'm guessing
oaicompat_completion_params_parse
function inexamples/server/server.cpp
can be used to add more parameters.The text was updated successfully, but these errors were encountered: