Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pinecone support for archival storage #635

Closed
sahusiddharth opened this issue Dec 17, 2023 · 5 comments
Closed

Add Pinecone support for archival storage #635

sahusiddharth opened this issue Dec 17, 2023 · 5 comments

Comments

@sahusiddharth
Copy link
Contributor

Having support for pinecone can be really helpful is a cloud native vector database, and has one of the best performance

@sahusiddharth
Copy link
Contributor Author

I have started working on this, there are couple of road blocks that I'm running into:-

  1. To make a connection to pinecone client we need to specify the api key and the environment
    I think we have to take them configs, need suggestions on how to take it further
  2. Can you please explain me the rationale behind the abstract function get_all_paginated

I would love to take your inputs as well @cpacker, @vivi, @sarahwooders

@cpacker
Copy link
Collaborator

cpacker commented Dec 17, 2023

@sahusiddharth I think you can follow for the most part the Chroma integration (https://github.com/cpacker/MemGPT/blob/main/memgpt/connectors/chroma.py) and just try to implement a parallel class for Pinecone.

AFAIK get_all_paginated is used primarily (only?) for the /attach command, @sarahwooders can confirm.

For API keys, yes we should store them in the base ~/.memgpt/config. The plan is to eventually split config into a config and credentials file, but this hasn't happened yet.

@sahusiddharth
Copy link
Contributor Author

I have got the basic structure down by taking chromadb as reference

I asked for get_all_paginated because when i went through the pinecone documentation, one can not query documents form pineconedb by taking fixed size steps.

About api keys how should I proceed?

  1. taking the api key using the kwargs way
  2. reading from a .txt file given a path
  3. suggestions?

@sarahwooders
Copy link
Collaborator

@sahusiddharth it would be great to have a pinecone integration! However, we are actually currently in the middle of refactoring some of the storage backends - could you please work off the storage-refactor branch instead of main? The postgres and chroma integrations are mostly complete, so you can model your changes off of them, but I still need to finish migrating a few more things before I can merge the refactored code in.

For API keys, I recommend placing them into the ~/.memgpt/config file, which you can do my adding a field to MemGPTConfig. We will probably eventually move towards having a separate credentials file, but for the time being are using the config file for everything.

We currently use get_all_paginated to copy data from data sources to agent archival memory. However, I think we may deprecate this function in the future since I'd like to avoid copying data into agent archival memory for connecting to external data sources. I would recommend just faking pagination for now, by calling get_all() and then paginating results -- and we can just add a warning about using pinecone with large datasets.

Copy link

github-actions bot commented Dec 6, 2024

This issue has been automatically closed due to 60 days of inactivity.

@github-actions github-actions bot closed this as completed Dec 6, 2024
mattzh72 pushed a commit that referenced this issue Jan 16, 2025
Co-authored-by: Mindy Long <mindy@letta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants