Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for bulk embedding in Ollama #735

Merged
merged 8 commits into from
Aug 28, 2024

Conversation

bricolage
Copy link
Contributor

@bricolage bricolage commented Aug 22, 2024

Description

As of v0.3.4 Ollama supports bulk embedding. This PR adds support for bulk embeddings via a new input: keyword param

Changes made

  • Added the input: keyword param to the #embed method
  • Made the text: keyword param optional (caveat: this will no longer throw an error if you omit it)
  • Updated related method #embeddings
  • Updated Ollama API endpoint for embeddings
  • Updated specs

Why this change is needed

  • The API for bulk embedding requires the input: keyword param with an array of strings
  • #embed currently requires the text: keyword param, which is unused when doing bulk embedding

Open topics

  • You can now call #embed but pass neither text: nor input: and it won't throw an error. Is there a preferred way to validate inputs when not relying on keyword argument defaults?

bricolage and others added 3 commits August 22, 2024 22:43
The example req/response looks like this:
```
curl http://localhost:11434/api/embed -d '{
  "model": "all-minilm",
  "input": ["Why is the sky blue?", "Why is the grass green?"]
}'

{
  "model": "all-minilm",
  "embeddings": [[
    0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,
    0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348
  ],[
    -0.0098027075, 0.06042469, 0.025257962, -0.006364387, 0.07272725,
    0.017194884, 0.09032035, -0.051705178, 0.09951512, 0.09072481
  ]]
}
```
model: defaults[:embeddings_model_name],
input: [],
Copy link
Collaborator

@andreibondarev andreibondarev Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bricolage What if we keep the same method interface but instead accept both text: "string" and text: [...]; similar to how OpenAI here takes either a string or an array of strings.

You could also also just always wrap the text below and pass it as an array:

parameters = {
  model: model,
  input: Array(text)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's nicer. Will do!

@@ -219,7 +219,8 @@ def embed(
)
parameters = {
prompt: text,
Copy link
Collaborator

@andreibondarev andreibondarev Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chalmagean I think we can just remove this prompt: parameter, right?

@andreibondarev
Copy link
Collaborator

@chalmagean Mind adding "Add support for bulk embedding in Ollama" to CHANGELOG.md? Thank you!

@andreibondarev andreibondarev merged commit 31ef26b into patterns-ai-core:main Aug 28, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants