[Feature Request] Ability to rewind model evaluation by a fixed number of tokens #1281

abetlen · 2023-05-02T15:00:02Z

The recent additions of the state and session APIs have made it possible to implement caching for llama models which has greatly improved the responsiveness in many applications.

The current APIs howeve still leave something to be desired, specifically it would be very useful to be able to rewind / rollback an evaluated model by a fixed number of tokens so a single longer saved state could be used to restore any shorter state.

mikeggh · 2023-05-03T23:52:32Z

How would this be possible the way the model works upon each layer? Or do you mean using RAM memory to keep X sessions (or diffs from some particular Y session) saved until completed, or replaced? This would give ability for rewinding with more ram usage.

if you meant it in another way, then please elaborate.. =] would love to hear other concepts around this as I'm working on adding roles, and other scenarios to my setup and it would help..

j-f1 · 2023-05-04T01:32:46Z

I think you should be able to reduce n_past by the number of tokens you want to forget and then run inference/prompt evaluation again, right?

abetlen · 2023-05-04T02:13:01Z

@j-f1 from my understanding (and experiments) n_past is the number of tokens to keep from the current eval position not the start of the context window.

DannyDaemonic · 2023-05-04T02:17:11Z

n_past is your position in the context window. I've successfully used an older position to roll back "state." In fact, it's done internally when you fill the context window and you used the --keep option.

This is also used with --session files. If you change the last characters of your prompt, it will use the state up until those characters and reevaluate the remaining.

ggerganov · 2023-05-04T04:51:32Z

@abetlen The n_past is counted from the start of the context. So reducing n_past by let's say 4 is equivalent to forgetting the last 4 tokens without any re-evaluation (see @DannyDaemonic's answer)

So maybe we do not even need an API change - the user code can simply control the n_past number to achieve the desired rewind function.

ejones · 2023-05-04T05:38:02Z

On the topic of restoring a shorter state from a longer session, I just put up #1310 which would allow sessions to be updated and restored incrementally. So potentially more efficient than loading the full session and rewinding.

mikeggh · 2023-05-04T21:44:27Z

@abetlen The n_past is counted from the start of the context. So reducing n_past by let's say 4 is equivalent to forgetting the last 4 tokens without any re-evaluation (see @DannyDaemonic's answer)

But this doesn't affect the logits/state though right? So it's only affect on future tokens would be with the sample algorithm and penalty right? Maybe I'm looking at this in the completely wrong way...

DannyDaemonic · 2023-05-04T22:10:54Z

The logits are updated after an eval. You can either eval the token at n_past again, or simply begin eval on new tokens if there's input to process.

This is where a new API function could come in. It could recalculate the logits starting at a particular n_past, but in practice, just evaluating the last token or starting to eval new tokens should be sufficient.

mikeggh · 2023-05-05T01:36:09Z

Thanks for the explanation! I'll be sure to read into all of the new apis..

abetlen · 2023-05-05T01:52:30Z

@j-f1 @ggerganov @DannyDaemonic this worked! I must've been doing something wrong (maybe not re-evaling the last token when the same prompt was used).

I'll close this issue as the n_past solution works great for me, no need to make any API changes, thanks!

mikeggh · 2023-05-05T03:16:58Z

Does this mean you could select a token, for example 20, and generate 5-10 completely different variations? Once you finish a variation with a specific temperature setting or other choices at token 20, can you simply rewind n_past by 16-32 and generate another entire line from the same initial token? Can you choose a different option from the sample without having to load or save a session until you generate as many variations as you want?

I think I'm confused at seeing a 1+gb state file which was purely just some tokens being inserted, and how I can just use n_past to rewind without it possibly affecting that. I looked into ggml's eval and I see the n_past is used, so maybe I just don't understand the network/kv cache aspect. I'll try to dig into some of the pdfs around the original model which may help me understand the differences..

DannyDaemonic · 2023-05-05T04:54:03Z

I was a bit confused myself. I'm still not 100% sure how it works, but I think the results are built up as you go but since each result builds on all the previous results, they are stored in a way that they can be referenced in the future, which means you can jump back to them later as well. You can see #1111 for a little more discussion but always verify what you read in a public forum.

I believe this is how a lot of online chat bots and authoring tools actually do it. When you click "Regenerate response" on ChatGPT, it's most likely just jumping back to a past spot in the context and starting again. I do notice a change in "tone" so I think they are also adjusting temperature or top k or something.

mikeggh · 2023-05-05T06:26:35Z

Great, I appreciate your response, even though the issue has already been closed. I had been contemplating it in my mind in the same way that you had approached it in 1111, and overall it makes more sense now. I will try to follow the code, but I understand that you can rewind with n_past, and I assume it will just overwrite prior KV cache. This is useful for the 10-15 variations example.

Currently, I am subjecting all my prompts to variations of temperatures and different models. This approach would enable me to save time and reduce CPU usage. Furthermore, I would like to delve deeper into creating more diverse outcomes and exploring GPT "creativity." Additionally, I am attempting to perform secondary queries for each prompt using NLP/Spacy to break down the English language used in each response. I also aim to provide explanations about proper nouns and other relevant information. Thus, conserving CPU through tricks and normal API usage is a wise move for me to achieve my objectives.
have a great weekend!

ggerganov added enhancement New feature or request good first issue Good for newcomers high priority Very important issue labels May 2, 2023

abetlen closed this as completed May 5, 2023

DannyDaemonic mentioned this issue May 27, 2023

Work around for recalculating logits in cached prompts (Fixes #1585) #1609

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Ability to rewind model evaluation by a fixed number of tokens #1281

[Feature Request] Ability to rewind model evaluation by a fixed number of tokens #1281

abetlen commented May 2, 2023

mikeggh commented May 3, 2023

j-f1 commented May 4, 2023

abetlen commented May 4, 2023 •

edited

Loading

DannyDaemonic commented May 4, 2023

ggerganov commented May 4, 2023

ejones commented May 4, 2023

mikeggh commented May 4, 2023

DannyDaemonic commented May 4, 2023

mikeggh commented May 5, 2023

abetlen commented May 5, 2023

mikeggh commented May 5, 2023 •

edited

Loading

DannyDaemonic commented May 5, 2023

mikeggh commented May 5, 2023 •

edited

Loading

[Feature Request] Ability to rewind model evaluation by a fixed number of tokens #1281

[Feature Request] Ability to rewind model evaluation by a fixed number of tokens #1281

Comments

abetlen commented May 2, 2023

mikeggh commented May 3, 2023

j-f1 commented May 4, 2023

abetlen commented May 4, 2023 • edited Loading

DannyDaemonic commented May 4, 2023

ggerganov commented May 4, 2023

ejones commented May 4, 2023

mikeggh commented May 4, 2023

DannyDaemonic commented May 4, 2023

mikeggh commented May 5, 2023

abetlen commented May 5, 2023

mikeggh commented May 5, 2023 • edited Loading

DannyDaemonic commented May 5, 2023

mikeggh commented May 5, 2023 • edited Loading

abetlen commented May 4, 2023 •

edited

Loading

mikeggh commented May 5, 2023 •

edited

Loading

mikeggh commented May 5, 2023 •

edited

Loading