-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change default temperature of OAI compat API from 0 to 1 #7226
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checking your assertion, can confirm that at least for chat completion mode, which is what we are dealing with this PR. The default is indeed temperature=1.0
Source: https://platform.openai.com/docs/api-reference/chat/create#chat-create-temperature
temperature
number or null
The sampling temperature used for this run. If not set, defaults to 1.
Just a quick note that this is example code not the actual llama.cpp endpoint itself. But still be useful to maintain consistency.
Note that when in transcript mode, creativity/temperature is by default 0. So temperature defaults can differ between different api endpoints.
Different models can tolerate different temperatures. What if 1 is too high for most models that people run locally? Default in main is 0.8. |
The value of 1 should work for any model assuming the logits weren't scaled whilst training. The value of 1 actually corresponds to the model outputting "well calibrated" probability estimates, ie: if you were to plot the post-softmax probability estimates along with the empirical fraction of times the next token fell in the respective "bin" (or the log of these values more likely), then assuming log-loss (aka "cross entropy" loss) was used; you'd find that temperature=1 would make the plots line up the best. (The inverse of this is even used to calibrate non-probabilistic models' outputs for SVM using "maximum margin" loss, etc: https://en.m.wikipedia.org/wiki/Platt_scaling) This doesn't necessarily mean the temperature=1 will be optimal for different use cases, but it should definitely not be broken and likely the best default IMO. |
This should make the API more similar to that of OpenAI's actual API