Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Pinecone document store optimizations #5902

Conversation

ivanazeljkovic
Copy link
Contributor

@ivanazeljkovic ivanazeljkovic commented Sep 27, 2023

Related Issues

Resolves: Pinecone Document Store optimizations #5901

Proposed Changes

  • Optimize delete_documents() and _get_vector_count() methods from PineconeDocumentStore;
  • Add appropriate warnings when Pinecone limits are exceeded with Starter index type due to its limitations

How did you test it?

Testing is done through existing unit and integration tests, as well as manually.

Checklist

@ivanazeljkovic ivanazeljkovic requested a review from a team as a code owner September 27, 2023 17:18
@ivanazeljkovic ivanazeljkovic requested review from julian-risch and removed request for a team September 27, 2023 17:18
@ivanazeljkovic ivanazeljkovic requested a review from a team as a code owner September 27, 2023 17:28
@ivanazeljkovic ivanazeljkovic requested review from dfokina and removed request for a team September 27, 2023 17:28
@coveralls
Copy link
Collaborator

coveralls commented Sep 27, 2023

Pull Request Test Coverage Report for Build 6535810474

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 362 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.01%) to 50.859%

Files with Coverage Reduction New Missed Lines %
schema.py 84 68.67%
document_stores/pinecone.py 278 52.47%
Totals Coverage Status
Change from base Build 6534987291: -0.01%
Covered Lines: 12970
Relevant Lines: 25502

💛 - Coveralls

@anakin87
Copy link
Member

Please merge main/rebase on main...
We fixed a test error in #5906.

@ivanazeljkovic ivanazeljkovic force-pushed the feature/pinecone-document-store-optimizations branch from a60cea8 to c93305c Compare September 28, 2023 13:11
Copy link
Contributor

@silvanocerza silvanocerza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just a couple of changes to make and it's ready to go. 👍

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this pull request! I just have two small change requests in addition to the change request by Silvano. Otherwise the pull request looks good to me! 👍

haystack/document_stores/pinecone.py Show resolved Hide resolved
haystack/document_stores/pinecone.py Show resolved Hide resolved
@ivanazeljkovic ivanazeljkovic force-pushed the feature/pinecone-document-store-optimizations branch 2 times, most recently from cce0088 to 4ea5ff8 Compare October 12, 2023 10:39
@ivanazeljkovic
Copy link
Contributor Author

@silvanocerza @julian-risch Thank you for a detailed review and feedback!
I fixed the things you pointed out. Also, improved few more things in the meantime, so your feedback is very welcome.

Improvements:

  • get_embedding_count method - if filters include doc_type related one, it will be removed and replaced with appropriate one (doc_type: "vector"), since this one is the only valid filter related to this meta field when counting documents with embedding,
  • get_document_count method - default value for doc_type meta field will be added only if filters doesn't include doc_type related ones,
  • additional optimizations related to deleting documents (refactoring underlying methods and increasing default batch size)
  • labels retrieval fix

@silvanocerza
Copy link
Contributor

Hey @ivanazeljkovic, I see there are some issues with CI.
They should have been fixed in main but I can't edit your PR, can you please update it so that CI runs again?

…ble warning messages when Pinecone limits are exceeded on Starter index type.
- Adding new test cases for get_embedding_count method
- Fixing get_embedding_count method
- Improving delete documents
- Fix label retrieval
- Increase default batch size
- Improve get_document_count method
@ivanazeljkovic ivanazeljkovic force-pushed the feature/pinecone-document-store-optimizations branch from 714971d to 85fbe82 Compare October 16, 2023 14:56
@ivanazeljkovic
Copy link
Contributor Author

Hi @silvanocerza, I have done a rebase so now all tests are passing successfully.

Copy link
Contributor

@silvanocerza silvanocerza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🚀

@silvanocerza silvanocerza dismissed julian-risch’s stale review October 16, 2023 17:26

Requested changes have been made

@silvanocerza silvanocerza merged commit 2326f2f into deepset-ai:main Oct 16, 2023
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pinecone Document Store optimizations
6 participants