[Feature Request] --prompt-cache-all + user input #1398

skidd-level-100 · 2023-05-11T00:38:58Z

I noticed the ' --prompt-cache-all' and ' --prompt-cache' for '--session' replacement, but the ' --prompt-cache-all' does not support user input, why not? and why not only store tokens in the context window, I would like to resume input/output with the model from a file this would be sort of like persistent memory, and would be awesome!

ejones · 2023-05-11T03:36:51Z

Yeah, we punted on --prompt-cache-all in interactive mode because of the complexities of properly saving the session file on various exit paths. But it does support input in the sense of appending to the prompt in successive calls to ./main. My plan for this is to support things like long-running and persistent chats, just with main invoked for each message (which is now fast with the cache) and managing context outside main.

x4080 · 2023-05-11T21:55:34Z

@ejones I confused about this as well, could please kindly provide example how to use prompt cache all, without using interactive mode ? Can I use it for like :

generate html page
modify the generated page
another modify that generated page

Or this is different than what I thought ?

Thanks

skidd-level-100 · 2023-05-11T22:15:45Z

so what your saying is that I can quickly hack up a bash script and have pseudo persistent bot?

ejones · 2023-05-12T19:20:50Z

At a basic level, the way to leverage this is to feed back the output of one call to ./main as the prompt to the next call, optionally appending additional input. #1338 has an example of that in the testing section. That said, there are some additional considerations, including that it's up to the caller to ensure the prompt doesn't exceed the context size (in the long run I believe this will be preferable). I'm hoping to put up an Bash example of chat using prompt caches instead of --interactive that will illustrate this.

x4080 · 2023-05-12T22:43:16Z

@ejones Thanks for the link, I'll test it with my own prompt, so its useful for generating story

x4080 · 2023-05-13T00:43:48Z

@ejones I wondered why --prompt-cache-all not saving the last message generated by the LLM, instead we have to put the last message again, is it better that it saves also the message generated by the LLM, so that in back-forth chat session, we can just add another question and not put / copy he last message generated by the LLM

Sorry if I'm wrong with this 😄

ejones · 2023-05-16T04:10:30Z

Yeah, I tried a version where it restored and appended to the saved prompt, but I didn't want to have to rely on the contents of the prompt cache. There's no way to inspect prompt caches (yet) and there may be cases where they don't get saved or get corrupted. So for now, the prompt argument is the source of truth and the prompt cache is just a cache.

The use case I envision for this is for a script / app to manage the chat session etc. rather than repeatedly invoking main on the command line. The example I'm preparing now will illustrate this.

x4080 · 2023-05-16T04:22:34Z

ok @ejones thanks

ejones · 2023-05-17T03:43:36Z

Got a PR up for persistent chat: #1495. Note that it depends on #1032, still open.

vbguyny · 2023-05-23T04:29:17Z

@ejones Looks like the PRs have been merged. Could you explain how to use this new feature?

x4080 · 2023-05-23T07:41:06Z

@ejones Looks like the PRs have been merged. Could you explain how to use this new feature?

Good one, I'm interested in it too

ejones · 2023-05-23T17:02:32Z

For the persistent chat script, I have a PR up at #1568 with docs on its usage. For the --prompt-cache and --prompt-cache-all, the basic idea is to run ./main with those options specified, save the output (e.g., in a file or variable), append your next input, and repeat. If you do this, main should only need to evaluate from the new input onwards. Note that if you do this indefinitely, you need to track the size of the prompt and make sure it doesn't exceed the size of the context.

This usage is demonstrated in examples/chat-persistent.sh, although it might be possible to come up with an even more minimal example. I've been considering an agent-style action/observation loop example.

x4080 · 2023-05-23T20:33:32Z

@ejones is it now no need to add the last output from llama for the new request using --prompt-cache-all ?

marknuppnau · 2024-02-22T17:57:37Z

@divinity76 , I do not believe it is as simple as your pseudo-example would suggest. For example, when I run:

./main -ngl 84 -m models/llama-2-7b.Q4_0.gguf --color -c 4096 -n 40 -s 42 --temp 0.7 --repeat_penalty 1.1 -r "User:" --prompt-cache cache.prompt.bin --prompt-cache-all -f ./prompts/chat-with-bob.txt

I would expect to be able to append a prompt to the cached prompt. The cache.prompt.bin file does get created and is confirmed with main: saving final output to session file 'cache.prompt.bin'. However, when I run:

./main -ngl 84 -m models/llama-2-7b.Q4_0.gguf --color -c 4096 -n 40 -s 42 --temp 0.7 --repeat_penalty 1.1 -r "User:" --prompt-cache cache.prompt.bin --prompt-cache-all -p "What is your first name?"

The cached prompt is not loaded and there is no previous context for the response to correctly answer, "Bob.". The script outputs:

main: attempting to load saved session from 'cache.prompt.bin' main: loaded a session with prompt size of 0 tokens

divinity76 · 2024-02-22T21:31:32Z

oh ok sorry, i may be wrong and i don't have time to investigate, nevermind

github-actions · 2024-04-09T01:09:25Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

akumaburn · 2025-01-28T20:58:14Z

For the persistent chat script, I have a PR up at #1568 with docs on its usage. For the --prompt-cache and --prompt-cache-all, the basic idea is to run ./main with those options specified, save the output (e.g., in a file or variable), append your next input, and repeat. If you do this, main should only need to evaluate from the new input onwards. Note that if you do this indefinitely, you need to track the size of the prompt and make sure it doesn't exceed the size of the context.

This usage is demonstrated in examples/chat-persistent.sh, although it might be possible to come up with an even more minimal example. I've been considering an agent-style action/observation loop example.

Doesn't this require the model to be reloaded/copied into VRAM (or RAM in CPU case) with each additional prompt?

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] --prompt-cache-all + user input #1398

[Feature Request] --prompt-cache-all + user input #1398

skidd-level-100 commented May 11, 2023

ejones commented May 11, 2023

x4080 commented May 11, 2023

skidd-level-100 commented May 11, 2023

ejones commented May 12, 2023

x4080 commented May 12, 2023

x4080 commented May 13, 2023

ejones commented May 16, 2023

x4080 commented May 16, 2023

ejones commented May 17, 2023

vbguyny commented May 23, 2023

x4080 commented May 23, 2023

ejones commented May 23, 2023

x4080 commented May 23, 2023

marknuppnau commented Feb 22, 2024

divinity76 commented Feb 22, 2024

github-actions bot commented Apr 9, 2024

akumaburn commented Jan 28, 2025

[Feature Request] --prompt-cache-all + user input #1398

[Feature Request] --prompt-cache-all + user input #1398

Comments

skidd-level-100 commented May 11, 2023

ejones commented May 11, 2023

x4080 commented May 11, 2023

skidd-level-100 commented May 11, 2023

ejones commented May 12, 2023

x4080 commented May 12, 2023

x4080 commented May 13, 2023

ejones commented May 16, 2023

x4080 commented May 16, 2023

ejones commented May 17, 2023

vbguyny commented May 23, 2023

x4080 commented May 23, 2023

ejones commented May 23, 2023

x4080 commented May 23, 2023

marknuppnau commented Feb 22, 2024

divinity76 commented Feb 22, 2024

github-actions bot commented Apr 9, 2024

akumaburn commented Jan 28, 2025