Reverting generated output/user input! #604

niansa · 2023-03-29T18:51:38Z

Hey!

This is a feature request for reverting input/output. One example usecase is to be able to retry generation if the response wasn't as desired.
One way of implementing this could be by adding the ability to create "snapshots" using signals(?).

Thanks a lot
niansa

x02Sylvie · 2023-03-29T20:10:44Z

Adding retry through ctrl + r could be great for interactive mode, especially the chat mode when AI messes up.

Additionally could be neat if included as part of API

LostRuins · 2023-03-30T04:16:24Z

You might want to check out https://github.com/LostRuins/llamacpp-for-kobold which has this feature, plus it caches the same tokens from the previous prompt to avoid the need for reprocessing the whole prompt if you only want to retry a single sentence.

niansa · 2023-03-30T12:11:12Z

Is restoring old token array enough to return to the same state?

niansa · 2023-03-30T21:15:43Z

I tested, it is not, unfortunately.

>>> i.append("he then shouted:")
>>> i.tokens
[1, 354, 769, 21272, 287, 29901]
>>> tk = i.tokens
>>> i.run('\n')
' `O Lord the Great King, protect our troops in battle\' [i:thanks: ] 2 Samuel. Chapter XX, vs 59:4, is the only mention found so. The word kashrat occurs, also without further elpbr,in. 35 of. 55, is. the passage of Eph. II., "Huseths the head, forasmuch a the sore is spread all about his bed" (i. E, Hag.. gon).. This word "bedd " appears without elabornt 7t twice ove:; r, "Jews shall dwell at JerUSAH.J 5 5 , the king and governrment oi New Mexico will be in the midst o? this great. nationality, which was in its nature of the kind most conducentive to a high tone a political organization on behon, a free country to adopt as an article " of rhe faith of its\'people.\' If our nation would take its stand, for\'ever,\' to\'protecr this, which. wii,is \'JiJJj. a n- " j Jjj \' -, I j f j n 27 "'
>>> i.tokens = tk
>>> i.run('\n')
'1'
>>> i.run('\n')
'Taub 5.05a-d.'
>>> i.tokens = tk
>>> i.run(' ')
'[url=\\U[/u]. In my heart he knows. he always did[br / [/l], the guarani in Parque San Martín was always my most cheried destination of those I was allowed access. Homepage of Dr P Rathin Trivedy\nPradeenam'
>>>

horenbergerb · 2023-04-02T19:46:06Z

You might want to check out https://github.com/LostRuins/llamacpp-for-kobold which has this feature, plus it caches the same tokens from the previous prompt to avoid the need for reprocessing the whole prompt if you only want to retry a single sentence.

@LostRuins are you interested in elaborating how you achieved this in llamacpp-for-kobold? Or would you be able to point towards relevant code in your repo? Thanks for you work, btw. The kobald UI looks pretty clean and I'm definitely keeping an eye on it!

horenbergerb · 2023-04-02T22:52:56Z

I'm trying to wrap my head around how this feature would be implemented for the interactive mode.

It seems like you'd need to keep track of the last message in this block. Then you'd need to catch the ctrl-r signal. It should interrupt any ongoing generation/input and remove the last message from embd. Lastly you need to ensure the program knows to start generating a message again.

Is it actually that simple? I'm about to go on vacation for a week, but I'll try to experiment with it when I can.

LostRuins · 2023-04-03T05:39:10Z

@horenbergerb Sure, so basically what I do is reuse the old KV state from the context. The important thing to note is the n_past parameter. Essentially it's a pointer indicating how much of the context should be used, think of it like an attention mask.

So once you've reused the old state, simply tokenize the old prompt together with the new one, and compare the IDs in both tokenized arrays. Start from the beginning, n_past = position 0. For each position that matches on both arrays, increment n_past by 1, until you reach a divergence. That is the common tokens that you can preserve from the old state. Then truncate your embd_inp such that those tokens are removed from it, and rewind n_past to the point of divergence. You can now proceed to generate and stochastically sample from that point.

niansa · 2023-04-04T08:54:54Z

Thanks for the hint! Definitely going to mess with that.

Add doc string for n_gpu_layers argument and make -1 offload all layers

github-actions · 2024-04-12T01:07:20Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

gjmulder added the enhancement New feature or request label Mar 30, 2023

cyyynthia mentioned this issue Apr 4, 2023

Feature to Discard Last Generated Message in Interactive Chat Mode? #764

Closed

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

Merge pull request ggml-org#604 from aliencaocao/main-1

b99e758

Add doc string for n_gpu_layers argument and make -1 offload all layers

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 12, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reverting generated output/user input! #604

Reverting generated output/user input! #604

niansa commented Mar 29, 2023

x02Sylvie commented Mar 29, 2023

LostRuins commented Mar 30, 2023

niansa commented Mar 30, 2023

niansa commented Mar 30, 2023 •

edited

Loading

horenbergerb commented Apr 2, 2023

horenbergerb commented Apr 2, 2023 •

edited

Loading

LostRuins commented Apr 3, 2023

niansa commented Apr 4, 2023

github-actions bot commented Apr 12, 2024

Reverting generated output/user input! #604

Reverting generated output/user input! #604

Comments

niansa commented Mar 29, 2023

x02Sylvie commented Mar 29, 2023

LostRuins commented Mar 30, 2023

niansa commented Mar 30, 2023

niansa commented Mar 30, 2023 • edited Loading

horenbergerb commented Apr 2, 2023

horenbergerb commented Apr 2, 2023 • edited Loading

LostRuins commented Apr 3, 2023

niansa commented Apr 4, 2023

github-actions bot commented Apr 12, 2024

niansa commented Mar 30, 2023 •

edited

Loading

horenbergerb commented Apr 2, 2023 •

edited

Loading