Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement max_marginal_relevance_search in VectorStore of Pinecone #6056

Merged
merged 2 commits into from
Jun 13, 2023
Merged

Conversation

neo
Copy link
Contributor

@neo neo commented Jun 12, 2023

This adds implementation of MMR search in pinecone; and I have two semi-related observations about this vector store class:

  • Maybe we should also have a similarity_search_by_vector_returning_embeddings like in supabase, but it's not in the base VectorStore class so I didn't implement
  • Talking about the base class, there's similarity_search_with_relevance_scores, but in pinecone it is called similarity_search_with_score; maybe we should consider renaming it to align with other VectorStore base and sub classes (or add that as an alias for backward compatibility)

Who can review?

Tag maintainers/contributors who might be interested:

  • VectorStores / Retrievers / Memory - @dev2049

Copy link
Collaborator

@rlancemartin rlancemartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! (1) Can you add one of two examples of these to the Pinecone notebook docs/modules/indexes/vectorstores/examples/pinecone.ipynb? (2) Can the functions be combined given the small difference between them (passing query vs embedding)? May help reduce code duplication.

@neo
Copy link
Contributor Author

neo commented Jun 13, 2023

Thank you for the review!

  1. yes I will add an example
  2. I agree, I only did both in order to stay consistent with supabase (which I used as the example for the implementation); I'm thinking to keep the query one?
    • edit: looking at the code, I remembered the base VectorStore abstract class also has both methods, so not sure whether we should have both in the pinecone implementation...
    • edit again: in fact, I did a search and most of the vectorstores have both of them?

What do you think about my comments about similarity_search_by_vector_returning_embeddings and _similarity_search_with_relevance_scores? Shall we consider implementing these two?

@rlancemartin
Copy link
Collaborator

rlancemartin commented Jun 13, 2023

I did a search and most of the vectorstores have both of them

Right, in that case, yes let's keep both. So, you can just add to the notebook :)

similarity_search_by_vector_returning_embeddings

You are welcome to add it here or in a follow-up PR. Please add a bit more documentation than Supabase has and a notebook example.

Talking about the base class, there's similarity_search_with_relevance_scores, but in pinecone it is called similarity_search_with_score

This is a good call out. I will check with Harrison, but let's take it up in a separate PR.

@neo
Copy link
Contributor Author

neo commented Jun 13, 2023

  • examples added in notebook
  • might follow up on and maybe discuss/think more on similarity_search_by_vector_returning_embeddings as it seems only supabase has it
  • agreed that similarity_search_with_relevance_scores should be a different PR, just wanted to bring up as it's also in pinecone – maybe could do something like https://github.com/hwchase17/langchain/pull/6015/files to deprecate it first before removal

@rlancemartin
Copy link
Collaborator

Yes, we should add a DeprecationWarning. Feel free to put up a PR w/ that change.

Copy link
Collaborator

@rlancemartin rlancemartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@neo
Copy link
Contributor Author

neo commented Jun 13, 2023

Yes, we should add a DeprecationWarning. Feel free to put up a PR w/ that change.

Will do. How about similarity_search_by_vector_returning_embeddings? It might fit into the same PR as they're just method name changes. Do you want to maybe discuss with the wider team to see if it's something we want to standardize across vector stores? Otherwise, we prob don't need to add that.

@rlancemartin
Copy link
Collaborator

Yes, we should add a DeprecationWarning. Feel free to put up a PR w/ that change.

Will do. How about similarity_search_by_vector_returning_embeddings? It might fit into the same PR as they're just method name changes. Do you want to maybe discuss with the wider team to see if it's something we want to standardize across vector stores? Otherwise, we prob don't need to add that.

Yes, let's take this up in a new PR since name changes have broader scope. Feel free to put up a PR and I can circulate it with folks. In the meantime, I'm going to merge this one :)

@rlancemartin rlancemartin merged commit f9edf76 into langchain-ai:master Jun 13, 2023
Undertone0809 pushed a commit to Undertone0809/langchain that referenced this pull request Jun 19, 2023
langchain-ai#6056)

This adds implementation of MMR search in pinecone; and I have two
semi-related observations about this vector store class:
- Maybe we should also have a
`similarity_search_by_vector_returning_embeddings` like in supabase, but
it's not in the base `VectorStore` class so I didn't implement
- Talking about the base class, there's
`similarity_search_with_relevance_scores`, but in pinecone it is called
`similarity_search_with_score`; maybe we should consider renaming it to
align with other `VectorStore` base and sub classes (or add that as an
alias for backward compatibility)

#### Who can review?

Tag maintainers/contributors who might be interested:
 - VectorStores / Retrievers / Memory - @dev2049
hwchase17 pushed a commit that referenced this pull request Jun 20, 2023
Just so it is consistent with other `VectorStore` classes.

This is a follow-up of #6056 which also discussed the potential of
adding `similarity_search_by_vector_returning_embeddings` that we will
continue the discussion here.

potentially related: #6286 


#### Who can review?

Tag maintainers/contributors who might be interested: @rlancemartin 

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants