Skip to content

Commit

Permalink
+mdb atlas vectordb [clean_final] (#3000)
Browse files Browse the repository at this point in the history
* +mdb atlas

* Update test/agentchat/contrib/vectordb/test_mongodb.py

Co-authored-by: HRUSHIKESH DOKALA <96101829+Hk669@users.noreply.github.com>

* update test_mongodb.py; we dont need to do the assert .collection_name vs .name

* Try fix mongodb service

* Try fix mongodb service

* Update username and password

* Update autogen/agentchat/contrib/vectordb/mongodb.py

* closer --- but im not super thrilled about the solution...

* PYTHON-4506 Expanded tests and simplified vector search pipelines

* Update mongodb.py

* Update mongodb.py - Casey

* search_index_magic

index_name change; keeping track of lucene indexes is tricky

* Fix format

* Fix tests

* hacking trying to figure this out

* Streamline checks for indexes in construction and restructure tests

* Add tests for score_threshold, embedding inclusion, and multiple query tests

* refactored create_collection to meet base object requirements

* lint

* change the localhost port to 27017

* add test to check that no embedding is there unless explicitly provided

* Update logger

* Add test get docs with ids=None

* Rename and update notebook

* have index management include waiting behaviors

* Adds further optional waits or users and tests. Cleans up upsert.

* ensure the embedding size for multiple embedding inputs is equal to dimensions

* fix up tests and add configuration to ensure documents and indexes are READY for querying

* fix import failure

* adjust typing for 3.9

* fix up the notebook output

* changed language to communicate time taken on first init_chat call

* replace environment variable usage

---------

Co-authored-by: Fabian Valle <fabian.valle-simmons@mongodb.com>
Co-authored-by: HRUSHIKESH DOKALA <96101829+Hk669@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Casey Clements <casey.clements@mongodb.com>
Co-authored-by: Jib <jib.adegunloye@mongodb.com>
Co-authored-by: Jib <Jibzade@gmail.com>
Co-authored-by: Cozypet <yanhan860711@gmail.com>
  • Loading branch information
8 people authored and victordibia committed Jul 30, 2024
1 parent 1d86c89 commit dc79b15
Show file tree
Hide file tree
Showing 6 changed files with 1,561 additions and 2 deletions.
7 changes: 7 additions & 0 deletions .github/workflows/contrib-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,10 @@ jobs:
--health-retries 5
ports:
- 5432:5432
mongodb:
image: mongodb/mongodb-atlas-local:latest
ports:
- 27017:27017
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
Expand All @@ -104,6 +108,9 @@ jobs:
- name: Install pgvector when on linux
run: |
pip install -e .[retrievechat-pgvector]
- name: Install mongodb when on linux
run: |
pip install -e .[retrievechat-mongodb]
- name: Install unstructured when python-version is 3.9 and on linux
if: matrix.python-version == '3.9'
run: |
Expand Down
9 changes: 7 additions & 2 deletions autogen/agentchat/contrib/vectordb/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,8 @@ def get_docs_by_ids(
ids: List[ItemID] | A list of document ids. If None, will return all the documents. Default is None.
collection_name: str | The name of the collection. Default is None.
include: List[str] | The fields to include. Default is None.
If None, will include ["metadatas", "documents"], ids will always be included.
If None, will include ["metadatas", "documents"], ids will always be included. This may differ
depending on the implementation.
kwargs: dict | Additional keyword arguments.
Returns:
Expand All @@ -200,7 +201,7 @@ class VectorDBFactory:
Factory class for creating vector databases.
"""

PREDEFINED_VECTOR_DB = ["chroma", "pgvector", "qdrant"]
PREDEFINED_VECTOR_DB = ["chroma", "pgvector", "mongodb", "qdrant"]

@staticmethod
def create_vector_db(db_type: str, **kwargs) -> VectorDB:
Expand All @@ -222,6 +223,10 @@ def create_vector_db(db_type: str, **kwargs) -> VectorDB:
from .pgvectordb import PGVectorDB

return PGVectorDB(**kwargs)
if db_type.lower() in ["mdb", "mongodb", "atlas"]:
from .mongodb import MongoDBAtlasVectorDB

return MongoDBAtlasVectorDB(**kwargs)
if db_type.lower() in ["qdrant", "qdrantdb"]:
from .qdrant import QdrantVectorDB

Expand Down
Loading

0 comments on commit dc79b15

Please sign in to comment.