-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: openAI explicit value for maxToken and temperature #659
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please can you add those as constants at the top of file?
- PR name needs to be semantic e.g. "feat: OpenAI: explicit value for MaxToken and Temp "
Nice suggestion!
8e5325e
to
6db4faf
Compare
done . Thanks ! |
e0dc60d
to
6db4faf
Compare
Why do we need to specify Temperature value as a constant ? Is this worth doing across all AI backends ? |
6db4faf
to
a9ccb06
Compare
a9ccb06
to
ed138fb
Compare
ed138fb
to
e01f149
Compare
Hi, @arbreezy : |
@panpan0000 makes sense. |
9a2da11
to
3869dc3
Compare
Because when k8sgpt talks with vLLM, the default MaxToken is 16, which is so small. Given the most model supports 2048 token(like Llama1 ..etc), so put here for a safe value. Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
3869dc3
to
dcec8f4
Compare
sure. @arbreezy . already updated with second commit : (1) add it for all backend (2) add it as a cmd flag. can you please review again? Thanks a lot |
@panpan0000 @AlexsJones ah I think we need to also set it in the server mode right? |
Feels like an oversight that needs fixing yes |
Because when k8sgpt talks with vLLM, the default MaxToken is 16, which is so small.
Given the most model supports 2048 token(like Llama1 ..etc), so put here for a safe value.
Closes # NA
📑 Description
✅ Checks
ℹ Additional Information
When k8sgpt talks with local(on-premise) LLM model, for example, model serving by
vLLM
(refer to blog https://k8sgpt.ai/blog/post-6/)Problem was met when request from k8sgpt has NO default MaxToken value.
From the log of vLLM
you will see
temperature=0.7
andmax_tokens=16
such a small max_tokens will cause below issue: the answers from LLM was truncated.
So this fix try to make the value explicit , just like what was done in https://github.com/k8sgpt-ai/k8sgpt/blob/main/pkg/ai/cohere.go#L66
With this fix , we will see the new log from vLLM become better :
Regression Test
when swich back to online openAI, the result is still good