Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add embeddings for LocalAI #8134

Merged
merged 6 commits into from
Jul 24, 2023
Merged

Conversation

mudler
Copy link
Contributor

@mudler mudler commented Jul 22, 2023

Description:

This PR adds embeddings for LocalAI ( https://github.com/go-skynet/LocalAI ), a self-hosted OpenAI drop-in replacement. As LocalAI can re-use OpenAI clients it is mostly following the lines of the OpenAI embeddings, however when embedding documents, it just uses string instead of sending tokens as sending tokens is best-effort depending on the model being used in LocalAI. Sending tokens is also tricky as token id's can mismatch with the model - so it's safer to just send strings in this case.

Partly related to: #5256

Dependencies: No new dependencies

Twitter: @mudler_it

Maintainers: @rlancemartin, @eyurtsev, @hwchase17

@vercel
Copy link

vercel bot commented Jul 22, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 24, 2023 6:39pm

@dosubot dosubot bot added Ɑ: embeddings Related to text embedding models module 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Jul 22, 2023
@mudler mudler force-pushed the localai_embeddings branch from 99131ec to 0addb58 Compare July 22, 2023 17:36
Signed-off-by: mudler <mudler@localai.io>
@mudler mudler force-pushed the localai_embeddings branch from 0addb58 to 76ebaf2 Compare July 22, 2023 17:38
Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. lets add an example notebook for this
  2. lets add this to langchain/embeddings/__init__.py

@hwchase17 hwchase17 added the needs documentation PR needs to be updated with documentation label Jul 22, 2023
mudler added 2 commits July 23, 2023 11:59
Signed-off-by: mudler <mudler@localai.io>
Signed-off-by: mudler <mudler@localai.io>
@mudler
Copy link
Contributor Author

mudler commented Jul 23, 2023

@hwchase17 done! will follow up along with #5256 and once we have complete integration with LocalAI I'll update also the documentation page accordingly.

Taking this opportunity to ask - is there any interest into adding e.g. voice capabilities? LocalAI supports tts and audio-to-text as well

@baskaryan baskaryan removed the needs documentation PR needs to be updated with documentation label Jul 24, 2023
@baskaryan
Copy link
Collaborator

looks awesome, thanks @mudler!

there definitely is interest in voice but adding other modalities is a big change that we want to be super thoughtful about, and we haven't had the time to think it through just yet. very open to suggestions on the interface if you're eager to see it in langchain

@baskaryan baskaryan merged commit ae28568 into langchain-ai:master Jul 24, 2023
@mudler mudler deleted the localai_embeddings branch July 25, 2023 18:31

# https://stackoverflow.com/questions/76469415/getting-embeddings-of-length-1-from-langchain-openaiembeddings
def _check_response(response: dict) -> dict:
if any(len(d["embedding"]) == 1 for d in response["data"]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon for commenting old code. But the SO thread discusses OpenAI service, so it's hardly the case in LocalAI. Isn't it?
This is why I want to remove this retry condition in new integration package https://github.com/mkhludnev/langchain-localai WDYT?
This package spin off from the discussion #22399 (comment)

"""Call out to LocalAI's embedding endpoint."""
# handle large input text
if self.model.endswith("001"):
# See: https://github.com/openai/openai-python/issues/418#issuecomment-1525939500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we bother about OpenAI specifics when integrate with LocalAI? I don't think so. I'm going to wipe it there https://github.com/mkhludnev/langchain-localai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: embeddings Related to text embedding models module 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants