[Feature Suggestion] Load/Save current conversation's tokens into file #532

x02Sylvie · 2023-03-26T15:49:50Z

Now that we have infinite transcription mode. Would it be possible to dump tokens into file and load them back next time you run llama.cpp to resume conversation?

Although it will be tricky to implement efficiently with long conversations, for example by

storing prompt itself as tokens
store in-between messages as raw text
store last messages within ctx_size as tokens

linouxis9 · 2023-03-26T17:29:58Z

Yes, this feature seems especially important to have to avoid having to go through the inference process each time for the initial prompt allowing faster startup (as discussed here: #484 (comment)).

anzz1 · 2023-03-27T06:44:15Z

This is kinda related and would fit well together

Trace model outputs to a binary file #477

anzz1 · 2023-03-27T06:47:25Z

@linouxis9 you are talking about a different thing though, saving the state and not just the tokens. separation of state and model is part of the current roadmap

Saving/loading state needs a big file but loads fast, while saving/loading tokens needs a tiny file but would also still need to do inference like usual.

Someone please correct me if my analogy is bad, but I try to explain the difference using a real world analogy:

Mission:
You enter your car outside your home. You need to get to work.

Option A (Load tokens):
You start with a blank memory. You get the instructions on how to drive to work, ⬆⬆⬇⬇⬅➡⬅➡🅱🅰 , you drive there using the instructions, this takes some time. You see a friendly alpaca on your way there. You remember how you got there and that you saw a friendly alpaca on the way.

Option B (Load state):
You are implanted of a memory of you driving to work and seeing a friendly alpaca, then are instantly teleported to work. You remember how you got there and that you saw a friendly alpaca on the way.

These will both work.

Option C (This can't work):
You start with a blank memory. You are teleported to work with the instructions on how to drive to work. You don't remember how you got there, nor that you saw a friendly alpaca on the way. Even if you worked the instructions backwards somehow, you still couldn't possibly know about the friendly alpaca.

linouxis9 · 2023-03-27T11:30:31Z

So basically Option A could also be implemented by just passing back the previous conversation as the initial prompt?
While what I'm more interested in is Option B where we do not have to go through all the way to work again ;-)
I understand, thank you for your great analogy @anzz1!!

Bump black from 23.3.0 to 23.7.0

github-actions · 2024-04-12T01:07:34Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

gjmulder added duplicate This issue or pull request already exists enhancement New feature or request labels Mar 26, 2023

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

Merge pull request ggml-org#532 from abetlen/dependabot/pip/black-23.7.0

468fccb

Bump black from 23.3.0 to 23.7.0

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 12, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Suggestion] Load/Save current conversation's tokens into file #532

[Feature Suggestion] Load/Save current conversation's tokens into file #532

x02Sylvie commented Mar 26, 2023 •

edited

Loading

linouxis9 commented Mar 26, 2023

anzz1 commented Mar 27, 2023

anzz1 commented Mar 27, 2023 •

edited

Loading

linouxis9 commented Mar 27, 2023 •

edited

Loading

github-actions bot commented Apr 12, 2024

[Feature Suggestion] Load/Save current conversation's tokens into file #532

[Feature Suggestion] Load/Save current conversation's tokens into file #532

Comments

x02Sylvie commented Mar 26, 2023 • edited Loading

linouxis9 commented Mar 26, 2023

anzz1 commented Mar 27, 2023

anzz1 commented Mar 27, 2023 • edited Loading

linouxis9 commented Mar 27, 2023 • edited Loading

github-actions bot commented Apr 12, 2024

x02Sylvie commented Mar 26, 2023 •

edited

Loading

anzz1 commented Mar 27, 2023 •

edited

Loading

linouxis9 commented Mar 27, 2023 •

edited

Loading