Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverting generated output/user input! #604

Closed
niansa opened this issue Mar 29, 2023 · 9 comments
Closed

Reverting generated output/user input! #604

niansa opened this issue Mar 29, 2023 · 9 comments
Labels
enhancement New feature or request stale

Comments

@niansa
Copy link
Contributor

niansa commented Mar 29, 2023

Hey!

This is a feature request for reverting input/output. One example usecase is to be able to retry generation if the response wasn't as desired.
One way of implementing this could be by adding the ability to create "snapshots" using signals(?).

Thanks a lot
niansa

@x02Sylvie
Copy link

Adding retry through ctrl + r could be great for interactive mode, especially the chat mode when AI messes up.

Additionally could be neat if included as part of API

@LostRuins
Copy link
Collaborator

You might want to check out https://github.com/LostRuins/llamacpp-for-kobold which has this feature, plus it caches the same tokens from the previous prompt to avoid the need for reprocessing the whole prompt if you only want to retry a single sentence.

@niansa
Copy link
Contributor Author

niansa commented Mar 30, 2023

Is restoring old token array enough to return to the same state?

@gjmulder gjmulder added the enhancement New feature or request label Mar 30, 2023
@niansa
Copy link
Contributor Author

niansa commented Mar 30, 2023

I tested, it is not, unfortunately.

>>> i.append("he then shouted:")
>>> i.tokens
[1, 354, 769, 21272, 287, 29901]
>>> tk = i.tokens
>>> i.run('\n')
' `O Lord the Great King, protect our troops in battle\' [i:thanks: ] 2 Samuel. Chapter XX, vs 59:4, is the only mention found so. The word kashrat occurs, also without further elpbr,in. 35 of. 55, is. the passage of Eph. II., "Huseths the head, forasmuch a the sore is spread all about his bed" (i. E, Hag.. gon).. This word "bedd " appears without elabornt 7t twice ove:; r, "Jews shall dwell at JerUSAH.J 5 5 , the king and governrment oi New Mexico will be in the midst o? this great. nationality, which was in its nature of the kind most conducentive to a high tone a political organization on behon, a free country to adopt as an article " of rhe faith of its\'people.\' If our nation would take its stand, for\'ever,\' to\'protecr this, which. wii,is \'JiJJj. a n- " j Jjj \' -, I j f j n 27 "'
>>> i.tokens = tk
>>> i.run('\n')
'1'
>>> i.run('\n')
'Taub 5.05a-d.'
>>> i.tokens = tk
>>> i.run(' ')
'[url=\\U[/u]. In my heart he knows. he always did[br / [/l], the guarani in Parque San Martín was always my most cheried destination of those I was allowed access. Homepage of Dr P Rathin Trivedy\nPradeenam'
>>> 

@horenbergerb
Copy link

You might want to check out https://github.com/LostRuins/llamacpp-for-kobold which has this feature, plus it caches the same tokens from the previous prompt to avoid the need for reprocessing the whole prompt if you only want to retry a single sentence.

@LostRuins are you interested in elaborating how you achieved this in llamacpp-for-kobold? Or would you be able to point towards relevant code in your repo? Thanks for you work, btw. The kobald UI looks pretty clean and I'm definitely keeping an eye on it!

@horenbergerb
Copy link

horenbergerb commented Apr 2, 2023

I'm trying to wrap my head around how this feature would be implemented for the interactive mode.

It seems like you'd need to keep track of the last message in this block. Then you'd need to catch the ctrl-r signal. It should interrupt any ongoing generation/input and remove the last message from embd. Lastly you need to ensure the program knows to start generating a message again.

Is it actually that simple? I'm about to go on vacation for a week, but I'll try to experiment with it when I can.

@LostRuins
Copy link
Collaborator

@horenbergerb Sure, so basically what I do is reuse the old KV state from the context. The important thing to note is the n_past parameter. Essentially it's a pointer indicating how much of the context should be used, think of it like an attention mask.

So once you've reused the old state, simply tokenize the old prompt together with the new one, and compare the IDs in both tokenized arrays. Start from the beginning, n_past = position 0. For each position that matches on both arrays, increment n_past by 1, until you reach a divergence. That is the common tokens that you can preserve from the old state. Then truncate your embd_inp such that those tokens are removed from it, and rewind n_past to the point of divergence. You can now proceed to generate and stochastically sample from that point.

@niansa
Copy link
Contributor Author

niansa commented Apr 4, 2023

Thanks for the hint! Definitely going to mess with that.

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023
Add doc string for n_gpu_layers argument and make -1 offload all layers
@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

5 participants