Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Suggestion] Load/Save current conversation's tokens into file #532

Closed
x02Sylvie opened this issue Mar 26, 2023 · 5 comments
Closed
Labels
duplicate This issue or pull request already exists enhancement New feature or request stale

Comments

@x02Sylvie
Copy link

x02Sylvie commented Mar 26, 2023

Now that we have infinite transcription mode. Would it be possible to dump tokens into file and load them back next time you run llama.cpp to resume conversation?

Although it will be tricky to implement efficiently with long conversations, for example by

  • storing prompt itself as tokens
  • store in-between messages as raw text
  • store last messages within ctx_size as tokens
@linouxis9
Copy link

Yes, this feature seems especially important to have to avoid having to go through the inference process each time for the initial prompt allowing faster startup (as discussed here: #484 (comment)).

@gjmulder gjmulder added duplicate This issue or pull request already exists enhancement New feature or request labels Mar 26, 2023
@anzz1
Copy link
Contributor

anzz1 commented Mar 27, 2023

This is kinda related and would fit well together

@anzz1
Copy link
Contributor

anzz1 commented Mar 27, 2023

@linouxis9 you are talking about a different thing though, saving the state and not just the tokens. separation of state and model is part of the current roadmap

Saving/loading state needs a big file but loads fast, while saving/loading tokens needs a tiny file but would also still need to do inference like usual.

Someone please correct me if my analogy is bad, but I try to explain the difference using a real world analogy:

Mission:
You enter your car outside your home. You need to get to work.

Option A (Load tokens):
You start with a blank memory. You get the instructions on how to drive to work, ⬆⬆⬇⬇⬅➡⬅➡🅱🅰 , you drive there using the instructions, this takes some time. You see a friendly alpaca on your way there. You remember how you got there and that you saw a friendly alpaca on the way.

Option B (Load state):
You are implanted of a memory of you driving to work and seeing a friendly alpaca, then are instantly teleported to work. You remember how you got there and that you saw a friendly alpaca on the way.

These will both work.

Option C (This can't work):
You start with a blank memory. You are teleported to work with the instructions on how to drive to work. You don't remember how you got there, nor that you saw a friendly alpaca on the way. Even if you worked the instructions backwards somehow, you still couldn't possibly know about the friendly alpaca.

@linouxis9
Copy link

linouxis9 commented Mar 27, 2023

So basically Option A could also be implemented by just passing back the previous conversation as the initial prompt?
While what I'm more interested in is Option B where we do not have to go through all the way to work again ;-)
I understand, thank you for your great analogy @anzz1!!

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023
@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

4 participants