-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Suggestion] Load/Save current conversation's tokens into file #532
Comments
Yes, this feature seems especially important to have to avoid having to go through the inference process each time for the initial prompt allowing faster startup (as discussed here: #484 (comment)). |
This is kinda related and would fit well together |
@linouxis9 you are talking about a different thing though, saving the state and not just the tokens. separation of state and model is part of the current roadmap Saving/loading state needs a big file but loads fast, while saving/loading tokens needs a tiny file but would also still need to do inference like usual. Someone please correct me if my analogy is bad, but I try to explain the difference using a real world analogy: Mission: Option A (Load tokens): Option B (Load state): These will both work. Option C (This can't work): |
So basically Option A could also be implemented by just passing back the previous conversation as the initial prompt? |
Bump black from 23.3.0 to 23.7.0
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Now that we have infinite transcription mode. Would it be possible to dump tokens into file and load them back next time you run llama.cpp to resume conversation?
Although it will be tricky to implement efficiently with long conversations, for example by
The text was updated successfully, but these errors were encountered: