Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: Cassandra Vector Store: modernize implementation #27253

Merged
merged 12 commits into from
Oct 22, 2024

Conversation

epinzur
Copy link
Contributor

@epinzur epinzur commented Oct 10, 2024

Description:

This PR updates CassandraGraphVectorStore to be based off CassandraVectorStore, instead of using a custom CQL implementation. This allows users using a CassandraVectorStore to upgrade to a GraphVectorStore without having to change their database schema or re-embed documents.

This PR also updates the documentation of the GraphVectorStore base class and contains native async implementations for the standard graph methods: traversal_search and mmr_traversal_search in CassandraVectorStore.

Issue: No issue number.

Dependencies: #27078 (already-merged)

Lint and test:

  • Lint and tests all pass, including existing CassandraGraphVectorStore tests.
  • Also added numerous additional tests based of the tests in langchain-astradb which cover many more scenarios than the existing tests for Cassandra and CassandraGraphVectorStore

** BREAKING CHANGE**

Note that this is a breaking change for existing users of CassandraGraphVectorStore. They will need to wipe their database table and restart.

However:

  • The interfaces have not changed. Just the underlying storage mechanism.
  • Any one using langchain_community.vectorstores.Cassandra can instead use langchain_community.graph_vectorstores.CassandraGraphVectorStore and they will gain Graph capabilities without having to re-embed their existing documents. This is the primary goal of this PR.

Copy link

vercel bot commented Oct 10, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Oct 22, 2024 6:07pm

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. community Related to langchain-community Ɑ: vector store Related to vector store module labels Oct 10, 2024
@epinzur epinzur force-pushed the update_graph branch 2 times, most recently from 5bd1c47 to a6add20 Compare October 10, 2024 13:20
@epinzur epinzur force-pushed the update_graph branch 2 times, most recently from 1d344e9 to 034a485 Compare October 14, 2024 13:32
@eyurtsev
Copy link
Collaborator

cc @cbornet or @hemidactylus

@epinzur thank you for the contribution. I tagged the main maintainers for this integration. I took a quick glance, from a first pass it's not clear to me what this refactoring achieves. It does have a number of breaking changes as far as I can tell -- which could break existing users code.

@hemidactylus
Copy link
Contributor

@eyurtsev the author of this PR is a co-worker of mine (and of @cbornet's), we actually discussed several of the points addressed in this PR while it was being written.
The main assumption behind the changes here is twofold:

  1. we posit that current usage of the Cassandra graph vector store (the one prior to this PR) is essentially nil, especially in production. Aided by the fact that the graph layer is still marked as beta, we consider the pros vastly outnumber the cons.
  2. The pros: this PR makes the graph store something built on top of the (non-graph) "just vector store", community.vectorstores.Cassandra. This means, in particular, users would be able to "promote" their regular vector store into a graph store, without having to re-compute the embedding or re-ingest the stored entries - the graph structure consisting of certain metadata tags with the mechanism already supported by the graph store. To make this easily workable now - and maintainable in the future - some consistency between methods, signatures etc had to be reached.

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Oct 22, 2024
@efriis efriis enabled auto-merge (squash) October 22, 2024 18:07
@efriis efriis merged commit f636c83 into langchain-ai:master Oct 22, 2024
18 checks passed
ccurme pushed a commit that referenced this pull request Oct 22, 2024
**Description:** 

This PR updates `CassandraGraphVectorStore` to be based off
`CassandraVectorStore`, instead of using a custom CQL implementation.
This allows users using a `CassandraVectorStore` to upgrade to a
`GraphVectorStore` without having to change their database schema or
re-embed documents.

This PR also updates the documentation of the `GraphVectorStore` base
class and contains native async implementations for the standard graph
methods: `traversal_search` and `mmr_traversal_search` in
`CassandraVectorStore`.

**Issue:** No issue number.

**Dependencies:** #27078
(already-merged)

**Lint and test**: 
- Lint and tests all pass, including existing
`CassandraGraphVectorStore` tests.
- Also added numerous additional tests based of the tests in
`langchain-astradb` which cover many more scenarios than the existing
tests for `Cassandra` and `CassandraGraphVectorStore`

** BREAKING CHANGE**

Note that this is a breaking change for existing users of
`CassandraGraphVectorStore`. They will need to wipe their database table
and restart.

However:
- The interfaces have not changed. Just the underlying storage
mechanism.
- Any one using `langchain_community.vectorstores.Cassandra` can instead
use `langchain_community.graph_vectorstores.CassandraGraphVectorStore`
and they will gain Graph capabilities without having to re-embed their
existing documents. This is the primary goal of this PR.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
@epinzur epinzur deleted the update_graph branch October 23, 2024 07:31
efriis added a commit that referenced this pull request Nov 4, 2024
Description:

This fixes an issue that mistakenly created in
#27253. The issue
currently exists only in `langchain-community==0.3.4`.

Test cases were added to prevent this issue in the future.

Co-authored-by: Erick Friis <erick@langchain.dev>
shjunn pushed a commit to shjunn/langchain that referenced this pull request Nov 4, 2024
Description:

This fixes an issue that mistakenly created in
langchain-ai#27253. The issue
currently exists only in `langchain-community==0.3.4`.

Test cases were added to prevent this issue in the future.

Co-authored-by: Erick Friis <erick@langchain.dev>
yanomaly pushed a commit to yanomaly/langchain that referenced this pull request Nov 8, 2024
Description:

This fixes an issue that mistakenly created in
langchain-ai#27253. The issue
currently exists only in `langchain-community==0.3.4`.

Test cases were added to prevent this issue in the future.

Co-authored-by: Erick Friis <erick@langchain.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community lgtm PR looks good. Use to confirm that a PR is ready for merging. size:XXL This PR changes 1000+ lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants