Cache Feature Request #95

snxraven · 2023-04-18T14:14:05Z

The current implementation of caching is wonderful, its been a great help speeding up conversations.

I do notice this trips up when a secondary user starts a conversation, would it be possible to allow for multi-conversation caching?

The main issue currently is the fact that the cache grows large over time and if the second user submits a question and then the first user submits another their entire chat history is being reran all over.

jmtatsch · 2023-04-18T14:59:59Z

I agree the "dummy" caching feature is already really useful. Makes all the difference between me wanting to use it or rather going to openai ;)

Regarding a real caching feature we are waiting for upstream to persist the state correctly, right?

Hypothetically, how large would a single state be?
llama_model_load_internal: mem required = 9807.47 MB (+ 1608.00 MB per state)
Is that the relevant state size?

abetlen · 2023-04-18T18:19:31Z

@snxraven @jmtatsch definitely high on my list, unfortunately at the moment I'm blocked because I can't restore the model state. I've tried the kv_state api in both the python bindings and regular c++ but haven't been unable to actually use this to cut down any processing times or checkpoint the model.

If anyone get even a basic example to work using that API I'd be happy to implement this.

snxraven · 2023-04-19T04:42:49Z

Relevant issue has been opened by another dev over at llama.cpp: ggml-org/llama.cpp#1054

snxraven · 2023-04-19T18:41:37Z

Furthermore, this issue has been reopened so it may be fixed within llama.cpp:
ggml-org/llama.cpp#730

If you would like we can close this issue since there obviously is a solution coming.

SagsMug · 2023-04-21T08:05:03Z

I did once do this by simply having multiple instances of llama running.
This works because mmap uses the cache to load the model, so you‘re effectively only penalized by the state size.
But this could be undesireable because of cpu usage, and having multiple multi gb states in ram

snxraven · 2023-04-21T22:28:09Z

Closing this in favor of #44

snxraven closed this as completed Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Feature Request #95

Cache Feature Request #95

snxraven commented Apr 18, 2023

jmtatsch commented Apr 18, 2023

abetlen commented Apr 18, 2023 •

edited

Loading

snxraven commented Apr 19, 2023

snxraven commented Apr 19, 2023

SagsMug commented Apr 21, 2023 •

edited

Loading

snxraven commented Apr 21, 2023

Cache Feature Request #95

Cache Feature Request #95

Comments

snxraven commented Apr 18, 2023

jmtatsch commented Apr 18, 2023

abetlen commented Apr 18, 2023 • edited Loading

snxraven commented Apr 19, 2023

snxraven commented Apr 19, 2023

SagsMug commented Apr 21, 2023 • edited Loading

snxraven commented Apr 21, 2023

abetlen commented Apr 18, 2023 •

edited

Loading

SagsMug commented Apr 21, 2023 •

edited

Loading