You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the things it is going to be interesting to have out of the box about AI is the semantic cache of requests.
Actually, it could be used in any method but according to a recent study, 31% of queries to LLM can be cached (or, in other words, 31% of the queries are contextually repeatable), which can significantly improve response time in GenAI apps.
Yes, what we have here #659 is a concept of semantic cache.
The idea is to have something very similar to ChatMemory, so you can extend the default implementation (in-memory) with other products like Redis or something else.
Great, feel free to take a view in my example, you'll see that code to do it is not complex, some configuration parameters that's true. Calculating Keys is easy as by default Quarkus cache offers the interface to override the creation of keys. The problem is in the code to check if it is a cache miss or not.
One of the things it is going to be interesting to have out of the box about AI is the semantic cache of requests.
Actually, it could be used in any method but according to a recent study, 31% of queries to LLM can be cached (or, in other words, 31% of the queries are contextually repeatable), which can significantly improve response time in GenAI apps.
I created a simple example that implements this with Redis: https://github.com/lordofthejars-ai/quarkus-langchain-examples/tree/main/semantic-cache
Do you think it might be interesting to integrate this into Quarkus Cache system for example as Redis-semantic-cache or something like this?
The text was updated successfully, but these errors were encountered: