-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KCPP - Idle Responses - Can this be improved? #1229
Comments
Thanks for the feedback. I've added a change that will allow you to correct the behavior by selecting the "Continue Bot Replies" toggle. You can try it on https://lite.koboldai.net/ |
It worked flawlessly! I don't believe my "Continue Bot Replies" option works this way in my version. I'm using the latest cuda standalone linux executable. How do I get this update?
|
Update will be available in next version of KoboldCpp. |
Terrific! Tyvm. While I've got you here, suggestions:
And would you be able to explain how the "Auto" option works versus a timed delay for idle response? |
I feel like several versions ago "Idle Responses" used to work slightly differently (and better imo). I feel like it used to just continue outputting responses as if the token output limit wasn't reached. Now it seems to trigger differently and it rarely seems to continue the initial response.
Take the following excerpt from koboldcpp/tiny-llama-1.1b-chat-medical.q8_0. You can see that I start with the input = "Can you please describe in detail how the digestive system works?"
Ps. This is instruct mode.
The first {{[OUTPUT]}} is triggered by me submitting the initial {{[INPUT]}}, every subsequent follow up is from the Idle Response triggering.
You can see that the second {{[OUTPUT]}} = "Can you explain the different functions of the colon and rectum?" is the AI NOT continuing the first first output, it's now posing as me (or rather talking to itself). It then answers itself in the next {{[OUTPUT]}} and then once again asks itself another question.
Is there a way to correct this behavior? I feel like I used to be able to use Idle Responses to generate lengthy responses. I would ask a question, go do something else for a few minutes and come back to a several hundred token length response that was fully comprehensible.
What has changed or what can I do to achieve this behavior?
The text was updated successfully, but these errors were encountered: