Semantic caching for LLMs + Javascript application
llm-cache
is a powerful semantic caching library designed specifically for Large Language Models (LLMs) and JavaScript applications. It provides an efficient way to cache and manage the results generated by LLMs while preserving semantic context, making it ideal for projects that involve natural language processing, chatbots, question answering systems, and more.
This project is inspired by GPTCache
-
Semantic Caching: llm-cache focuses on caching not just raw text but also the semantic context in which LLM-generated responses are produced.
-
Efficient Memory Usage: It optimizes memory usage to handle large-scale LLMs without overwhelming your application's resources.
-
Simple API: llm-cache offers a straightforward API for caching and retrieving LLM-generated content.
-
Customizable: Easily configure llm-cache to fit your project's specific needs, from cache eviction policies to storage options.
-
Performance: Enhance response times by reducing the need to recompute LLM responses, especially for frequently asked questions or recurring queries.
npm install llm-cache
# or
yarn add llm-cache
const openAIApiKey = process.env.OPENAI_API_KEY as string;
const cacheManager = new CacheManager({
vectorStore: new FaissVectorStore({ dimension: 1536 }),
cacheStore: new MemoryCacheStore(),
evictionPolicy: CacheEvictionPolicy.fifo,
maxSize: 5,
});
const llmModel = new LangchainCache(
new OpenAI({
modelName: "gpt-4-0613",
temperature: 0,
streaming: false,
openAIApiKey: openAIApiKey,
maxConcurrency: 5,
}),
{
embeddings: new OpenAIEmbedding(openAIApiKey, 5),
cacheManager: cacheManager,
},
{}
);
const similarPrompts = [
"How to become a good engineer?",
"What steps should I take to become a skilled engineer?",
"Can you provide guidance on becoming a proficient engineer?",
"I'm curious about the path to becoming an excellent engineer.",
"What's the roadmap to becoming a competent engineer?",
"Seeking advice on how to develop into a great engineer.",
];
async function main() {
for (let i = 0; i < similarPrompts.length; i++) {
const start = Date.now();
const result = await llmModel.predict(similarPrompts[i]);
const end = Date.now();
console.log(`Result: ${result}`);
console.log(`Performance of ${i}th request: ${end - start}ms`);
console.log(`========================================`);
}
}
main()
.then(() => {
console.log("done");
})
.catch((e) => console.log("error", e));
If you'd like to contribute, check out the contributing guide.
Thank you to all the people who already contributed to LLM Cache!