-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
community: add sqlite-vec vectorstore #25003
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
metadatas: Optional list of metadatas associated with the texts. | ||
kwargs: vectorstore specific parameters | ||
""" | ||
max_id = self._connection.execute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would a RETURNING
clause make more sense here, instead of max_id
? Just to avoid multiple SQL statements, and would avoid subtle bugs with SQLite autoincrement values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't even aware of RETURNING (I copied this impl directly from the sqlite-vss), I'll try that! That also means I can remove the trigger and avoid having two copies of the embedding in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was completely unaware of the RETURNING clause as well. Unfortunately, this does not seem to be working with the sqlite3 executemany
method as pointed here: python/cpython#100021. I tried and got the error described on this issue.
Hey thanks for putting this up! I will be deprecating Also to note: I'll be adding proper metadata columns + filtering in the near future, which should simplify a lot of code here, but that would be for a little longer Happy to take a closer look if needed |
@abhiaagarwal I actually submitted a PR doing the actual same thing here: #25024 and did not pay attention someone else had already done it less than a day before me! I will close my PR then. Congratulation for beating me to it! |
Excellent work. |
@liunux4odoo I'm totally open to that — I've been trying to figure out an async client and it's been not fun. Let me try the sqlalchemy approach! |
@abhiaagarwal I feel your PR should go forward as it is for now, it's perfectly reasonable as it is. Sqlite-vec is a major vector store alternative that needs to be reflected in Langchain. I don't think we should wait. @liunux4odoo I like the idea of integrating it with SQLAlchemy but this feels like a major disruption for the purpose of this PR. I don't think the need was ever expressed with sqlite-vss to have SQLAlchemy, so how about we just go ahead without it for now, and if the need comes along we do a refacto for it? @asg017 I was completely unaware of the RETURNING feature (thanks for bringing that up). So I did try to implement it, but unfortunately got some error. It seems that RETURNING is not available (yet?) when using |
@baskaryan could this PR be looked at and merged soon? |
@philippe2803 yep, I discovered the RETURNING thing as well with executemany, I can actually solve it I believe by just doing multiple executes with RETURNING. I haven't benchmarked the performance characteristics yet, but it's probably not worth it. |
Howdy! Looks good other than the doc - could you rewrite to match the specified format? There's a cli command that bootstraps it here: #24800 |
@abhiaagarwal would you be OK to give me permission to the sqlite-vec branch on your langchain fork? I have made the changes (I believe) necessary for the docs build to pass. This would get this PR merged shortly. |
@philippe2803 absolutely, I just did it, let me know if it works for you! |
@efriis looks like this is all passing now. I believe this PR can now be merged. |
@efriis sorry for being pushy, but could this PR be merged? It's been pending for quite some time now, and every GHA are currently passing. At your disposal if I can help in any ways. |
@baskaryan @efriis could we please move forward with this PR please? it has been hanging for over a week now, although everything has passed. Happy to help in any ways |
@efriis anything the community can do to push this forward? |
Hey team! Linking the review process - the status is currently "needs support" so can give it an upvote to get it through! |
@efriis Thank you! Appreciate the approval! |
will require 0.3 because the current langchain-community library depends on pydantic v2 in other areas! |
**Description**: Adds a vector store integration with [sqlite-vec](https://alexgarcia.xyz/sqlite-vec/), the successor to sqlite-vss that is a single C file with no external dependencies. Pretty straightforward, just copy-pasted the sqlite-vss integration and made a few tweaks and added integration tests. Only question is whether all documentation should be directed away from sqlite-vss if it is defacto deprecated (cc @asg017). --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: philippe-oger <philippe.oger@adevinta.com>
Description:
Adds a vector store integration with sqlite-vec, the successor to sqlite-vss that is a single C file with no external dependencies.
Pretty straightforward, just copy-pasted the sqlite-vss integration and made a few tweaks and added integration tests. Only question is whether all documentation should be directed away from sqlite-vss if it is defacto deprecated (cc @asg017).