Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector store (pgvector/pinecone) support? #8

Open
29decibel opened this issue Sep 29, 2023 · 6 comments
Open

Vector store (pgvector/pinecone) support? #8

29decibel opened this issue Sep 29, 2023 · 6 comments

Comments

@29decibel
Copy link

Thank you @brainlid for starting this project! Played with the two live notebooks, works very well! Simple and very clean, well designed interfaces. 👍

I am wondering what's the roadmap moving forward, especially around vector store support.

Would very much love to migrate my NodeJS langchain projects to Phoenix/Elixir.

Thanks again for the effort! Can't wait to write more using it ❤️

@brainlid
Copy link
Owner

brainlid commented Oct 3, 2023

Hi @29decibel! I'd love to have support for Vector DBs and document searching using those vectors.

I don't personally have a need for those at the moment, so I don't plan to implement it myself.

My current focus is:

  • create an example app/video that for using this to create a chat interface with an LLM. (working on now)
  • Support for Llama 2 (either cpp version and/or the GPU version)
  • Support for Bard

That's my short list. I'd love contributions! I'm happy to talk through API design for the features as well. 🙂

@amokan
Copy link
Contributor

amokan commented Oct 10, 2023

@29decibel I have quite a bit of production experience with pgvector in context of Elixir so just tossing my two cents in as I think this is a great conversation to get started.

I think it would be great to get an initial implementation going to create vectors and maybe work with them in something like ETS for the sake of Livebook or ephemeral scenarios - but I'm a bit on the fence when it comes to integrating directly with something like pgvector in the context of Ecto, as that logic likely lives in the project using :langchain as a dependency, right? I say that because outside of very simple scenarios, you probably wouldn't want to leave your text splitting strategy up to a library and you may be doing a lot of other text preprocessing outside the scope of this. But first to admit I could be thinking about this topic wrong (or maybe I'm just unlucky due to the data I work with).

Worth noting that Scholar has distance calculations covered. While not a solution for thousands of embeddings, I think there is merit in considering something like that for a first pass to support vector distance scenarios without reaching for a full DB 🤷

Final question for you is if you have any examples out there (aside from the canonical langchain.js and python examples) that leverages vector search and does not use that logic in the core application codebase and instead relies on the implementation direct from langchain?

@brainlid
Copy link
Owner

@amokan I am only passively interested in supporting pgvector. I think its cool and I would like to have it, but I don't have any personal experience with it and it's currently not on my plan to implement.

I would love help in this area.

@brainlid
Copy link
Owner

For document access, a draft PR is being worked on: #3

@amokan
Copy link
Contributor

amokan commented Oct 23, 2023

@brainlid Haven't forgotten about this topic. Been thinking about some common interfaces in this area to make any effort on this front flexible.

I am currently tinkering with an ETS-based 'MemoryStore' in context of LLM chains for some other efforts and figure something similar may be a good first step in here. Basically using ETS as a context window and allowing vector support/distance.

If that is something that is of interest for this project, I can try to piece together a PR over the next week or two.

@brainlid
Copy link
Owner

@amokan Cool! I really don't know enough about this area to know if an ETS-based memory store makes sense. In principle, I'm not opposed to using ETS tables in this way with the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants