Provide examples that runs 100% locally, including embeddings #638
Replies: 6 comments
-
Hi @zsogitbe thanks for the feedback. The use of the word "serverless" is in the context of Kernel Memory, where the default deployment requires to stand up a "Kernel Memory web service with queues". The Serverless option allows to use Kernel Memory without a web service and without queues. The other internal dependencies are a different context, not covered by the "serverless" term. That said, you can use KM leveraging LLama locally in combination with SimpleStorage, SimpleQueue and SimpleVectorDb. The only external dependency we recommend is OpenAI embeddings, although you can also plug in your custom local embedding generator if you like, as long as you are ok with the different quality of local embedding generators, compared to those offered by OpenAI and similar. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your answer. I think that you should add at least one example which runs without any extra web registration need. May I suggest changing the LLamaSharp example to use LLamaSharp's own embedding generator? |
Beta Was this translation helpful? Give feedback.
-
There's an example here: https://github.com/microsoft/kernel-memory/blob/main/examples/105-dotnet-serverless-llamasharp/Program.cs For embeddings though we don't recommend using LLama because it doesn't provide sufficient quality for RAG, leading to no results or incorrect ones. |
Beta Was this translation helpful? Give feedback.
-
I do not have an Azure account... the code will probably crash... I think that you underestimate Lama models. Lama models are one of the best models. Let us test your assumption. Give me the output of this example and I will try to use a Lama model to see what result it gives. |
Beta Was this translation helpful? Give feedback.
-
I have created an example with the default LLamaSharp text embedding and this is the output of your example:
The only issue I have encountered is that KernelMemory does not tolerate a question mark at the end of the question. I think that this is intentional for some reason and this will probably be the reason for why you thought that lama models do not work well. |
Beta Was this translation helpful? Give feedback.
-
The problem is actually LLama embeddings ability to capture the meaning of text, which is very low particularly with mixed content. When it comes to using cosine similarity to find similar content, it would cause a lot of irrelevant text to be included in the RAG algorithm. If you want to experiment with local models I recommend looking at MTEB here https://huggingface.co/spaces/mteb/leaderboard |
Beta Was this translation helpful? Give feedback.
-
Context / Scenario
There should be some examples which you can just run locally without the need to first register on several services. If you state this is a 'serverless' example, then please make sure that it is completely serverless and that one can simply try it. It is a pity that non of the examples work locally!
What happened?
Endless errors:
Importance
edge case
Platform, Language, Versions
Any
Relevant log output
No response
Beta Was this translation helpful? Give feedback.
All reactions