Replies: 4 comments 11 replies
-
This probably isn't the right place to ask this question. Embeddings are more of a general machine learning topic than llama specific. The You can read more about how they work:
You would use embeddings returned from the model in a precomputed dataset or perhaps in a vector database to find similar text chunks. You probably would use a specialized model, like BGE, to generate embeddings for vector search though. Check out the llama-index RAG tutorials to learn more about how this works. Good luck and welcome to LLMs 👍 |
Beta Was this translation helpful? Give feedback.
-
Agree with the above, just want to add for emphasis: typically your embedding model will be different from your text generation model. Though embedding models and generative language models share similar archictectures, embedding models are usually much smaller. So to adapt the example you gave, you'd do something like: mod_emb = Llama(model_path=EMBEDDING_MODEL_PATH, embedding=True)
embeds = mod_emb.create_embedding(["My name is Anna."]) Then store this in a vector database of some sort, and later it query them with the embedding for "What is your name?" and get the relevant text (hopefully "My name is Anna."). Put that in mod_gen = Llama(model_path=GENERATION_MODEL_PATH)
prompt = CONTEXT + '\n\n' + "What is your name?"
result = mod_gen(prompt, max_tokens=max_tokens) There are ways to improve this substantially described in the above links, but this is the basic idea. |
Beta Was this translation helpful? Give feedback.
-
@gardner @iamlemec But how exactly are these embeddings computed? Are these the last layer hidden states? The question in #3643 has been left unanswered. From the source code I understood that it's the |
Beta Was this translation helpful? Give feedback.
-
I've noticed that when running the similar model using different libraries, such as I would like to further understand the differences between these libraries:
|
Beta Was this translation helpful? Give feedback.
-
Hi,
I am working with llama.cpp (Python) and the Mistral 7B Instruct model. All works fine so far.
Now I wonder what are embeddings and how to use them?
As far as I understand embeddings are used to support the LLM with additional context (e.g. data fetched from an internal database).
I also see that calling something like:
will generate an vector with a lot of numbers.
But can someone explain how to use embeddings to tell my llm to use them in a inference life cycle?
I expected some code example like
But it does not look like the
create_embedding
has any effect.Beta Was this translation helpful? Give feedback.
All reactions