-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Original prompt being forgotten. n_keep tokens not functioning as expected? #1647
Comments
It also took me a while to figure out how this works, but the reason you are not seeing the original prompt is that its context is literally kept when resetting, and The important part is the setting of If you still suspect that something goes wrong in this mechanism, it might be interesting to simulate the state after reset by concatenating the original prompt and the preserved part of the old text into a prompt and letting the model generate from that (with a fixed seed or temp==0 for determinism). You could get at the exact contents of the preserved text by uncommenting the "resetting" You should then see clearly if your text is continued with or without regard for the original prompt, without having to rely on the reset mechanism, i.e. if the model really "forgets" the original prompt, or if the combination of original prompt + half of the most recently generated text just is a poor substitute for actual bigger context. Thinking this through, I can imagine that the concatenation could sometimes be meaning-distorting, e.g. if the original prompt ends in something like "Human:", but what gets concatenated next is a truncated part of the AIs response, i.e. effectively flipping dialog turns. May be worth actually writing out for a concrete example... |
Appreciate the response. Before I dig too deep into it, I should probably ask for clarification on one point. What you're saying is, by the time it gets to Whats confusing me is that embd does contain data from last_n_tokens, which makes me think that this context "rotation" is a destructive operation, and it also looks like the original prompt is treated the same as any other input. So what I'm missing is exactly how it functions such that the original prompt is preserved, but last_n_tokens is not. The reason I'm trying to hunt this down, is because as far as I can tell, the model is able to properly recall all data from the original prompt until the context is refreshed when it overruns. After that point, the model cant seem to remember anything. We're talking like 10/10 accuracy before the context rolls, and (usually) like 0/10 after. For some reason however every once in a while it will be able to answer questions after the context rolls, but thats pretty rare. Its just super weird that the models entire personality pivots with the context roll. It starts hallucinating everything. I cant replicate this behavior at any point before the context switches either. |
One caveat: I have not written any of the code, I have just tried to understand it some weeks ago out of a similar concern as yours. What took me a while to notice (obvious in hindsight if you know where to look) is that, in standard In the code, the important part for preserving the original prompt is the parameter
It indicates how many characters of the previous state of the model should be preserved, before evaluating the contents of This is in fact used for all evaluations, to make sure that new samples are appended at the end of the context. Of course, non of this really guarantees that the partial buffer together with the prepended original prompt make enough sense to continue a meaningful dialog. For example, the user's last input may not be included, so the model has no clue any more what its actual task is... |
What now has to happen is that the context has to be shortened somehow. A trick that seems to be very effective is used where some of the old context is kept in the beginning (this is optional) and the second half of the old context is moved after that, freeing up half or less of the context space for new tokens.
embd.insert(embd.begin(), last_n_tokens.begin() + n_ctx - n_left/2 - embd.size(), last_n_tokens.end() - embd.size()) Let's unravel this one.
|
I've also noticed n_keep acting unexpectedly. With (-p) n_keep defaults to 0 as expected but with (-ins) it defaults to 2 even if you force --keep -1. Can somebody show their command line with -ins and a working --keep n? |
I'm going to close this, because while I cant explain why the model kept forgetting the prompt, after continuing to dig through the code for the last week I can at least say I understand how the eval works. In the end I've found a different way to ensure the model remembers the prompt. |
Could you please share the method/ fix you've found? It would be a great help to get it up and running. |
^ Trying to figure that out too. |
Windows.
Maybe I'm misinterpreting the functionality.
I noticed that my long-running bot was forgetting its original prompt, despite n-keep being -1. Pulled down the project, and threw a breakpoint in the section that manages the rollover
I'm not 100% sure exactly what this is supposed to do, its not intuitive to me because I'm a c# guy, but since last_n_tokens is being used and inserted into the embed, my assumption is that a new embed is being constructed to contain the new context information. I'm also assuming that based on the comments, this new embed is supposed to contain the original prompt. So this appears to be functioning like some kind of refresh.
What I'm not seeing, however, is my original prompt anywhere in this new embed.
I'm not sure if this is a bug, or if I'm fundamentally misunderstanding the intent of this block of code.
Could this be the cause of my session forgetting its original prompt?
The text was updated successfully, but these errors were encountered: