Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] --prompt-cache-all + user input #1398

Closed
skidd-level-100 opened this issue May 11, 2023 · 16 comments
Closed

[Feature Request] --prompt-cache-all + user input #1398

skidd-level-100 opened this issue May 11, 2023 · 16 comments
Labels

Comments

@skidd-level-100
Copy link

I noticed the ' --prompt-cache-all' and ' --prompt-cache' for '--session' replacement, but the ' --prompt-cache-all' does not support user input, why not? and why not only store tokens in the context window, I would like to resume input/output with the model from a file this would be sort of like persistent memory, and would be awesome!

@ejones
Copy link
Collaborator

ejones commented May 11, 2023

Yeah, we punted on --prompt-cache-all in interactive mode because of the complexities of properly saving the session file on various exit paths. But it does support input in the sense of appending to the prompt in successive calls to ./main. My plan for this is to support things like long-running and persistent chats, just with main invoked for each message (which is now fast with the cache) and managing context outside main.

@x4080
Copy link

x4080 commented May 11, 2023

@ejones I confused about this as well, could please kindly provide example how to use prompt cache all, without using interactive mode ? Can I use it for like :

  1. generate html page
  2. modify the generated page
  3. another modify that generated page

Or this is different than what I thought ?

Thanks

@skidd-level-100
Copy link
Author

so what your saying is that I can quickly hack up a bash script and have pseudo persistent bot?

@ejones
Copy link
Collaborator

ejones commented May 12, 2023

At a basic level, the way to leverage this is to feed back the output of one call to ./main as the prompt to the next call, optionally appending additional input. #1338 has an example of that in the testing section. That said, there are some additional considerations, including that it's up to the caller to ensure the prompt doesn't exceed the context size (in the long run I believe this will be preferable). I'm hoping to put up an Bash example of chat using prompt caches instead of --interactive that will illustrate this.

@x4080
Copy link

x4080 commented May 12, 2023

@ejones Thanks for the link, I'll test it with my own prompt, so its useful for generating story

@x4080
Copy link

x4080 commented May 13, 2023

@ejones I wondered why --prompt-cache-all not saving the last message generated by the LLM, instead we have to put the last message again, is it better that it saves also the message generated by the LLM, so that in back-forth chat session, we can just add another question and not put / copy he last message generated by the LLM

Sorry if I'm wrong with this 😄

@ejones
Copy link
Collaborator

ejones commented May 16, 2023

Yeah, I tried a version where it restored and appended to the saved prompt, but I didn't want to have to rely on the contents of the prompt cache. There's no way to inspect prompt caches (yet) and there may be cases where they don't get saved or get corrupted. So for now, the prompt argument is the source of truth and the prompt cache is just a cache.

The use case I envision for this is for a script / app to manage the chat session etc. rather than repeatedly invoking main on the command line. The example I'm preparing now will illustrate this.

@x4080
Copy link

x4080 commented May 16, 2023

ok @ejones thanks

@ejones
Copy link
Collaborator

ejones commented May 17, 2023

Got a PR up for persistent chat: #1495. Note that it depends on #1032, still open.

@vbguyny
Copy link

vbguyny commented May 23, 2023

@ejones Looks like the PRs have been merged. Could you explain how to use this new feature?

@x4080
Copy link

x4080 commented May 23, 2023

@ejones Looks like the PRs have been merged. Could you explain how to use this new feature?

Good one, I'm interested in it too

@ejones
Copy link
Collaborator

ejones commented May 23, 2023

For the persistent chat script, I have a PR up at #1568 with docs on its usage. For the --prompt-cache and --prompt-cache-all, the basic idea is to run ./main with those options specified, save the output (e.g., in a file or variable), append your next input, and repeat. If you do this, main should only need to evaluate from the new input onwards. Note that if you do this indefinitely, you need to track the size of the prompt and make sure it doesn't exceed the size of the context.

This usage is demonstrated in examples/chat-persistent.sh, although it might be possible to come up with an even more minimal example. I've been considering an agent-style action/observation loop example.

@x4080
Copy link

x4080 commented May 23, 2023

@ejones is it now no need to add the last output from llama for the new request using --prompt-cache-all ?

@marknuppnau
Copy link

@divinity76 , I do not believe it is as simple as your pseudo-example would suggest. For example, when I run:

./main -ngl 84 -m models/llama-2-7b.Q4_0.gguf --color -c 4096 -n 40 -s 42 --temp 0.7 --repeat_penalty 1.1 -r "User:" --prompt-cache cache.prompt.bin --prompt-cache-all -f ./prompts/chat-with-bob.txt

I would expect to be able to append a prompt to the cached prompt. The cache.prompt.bin file does get created and is confirmed with main: saving final output to session file 'cache.prompt.bin'. However, when I run:

./main -ngl 84 -m models/llama-2-7b.Q4_0.gguf --color -c 4096 -n 40 -s 42 --temp 0.7 --repeat_penalty 1.1 -r "User:" --prompt-cache cache.prompt.bin --prompt-cache-all -p "What is your first name?"

The cached prompt is not loaded and there is no previous context for the response to correctly answer, "Bob.". The script outputs:

main: attempting to load saved session from 'cache.prompt.bin' main: loaded a session with prompt size of 0 tokens

@divinity76
Copy link
Contributor

oh ok sorry, i may be wrong and i don't have time to investigate, nevermind

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

github-actions bot commented Apr 9, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants