Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mongo7 upgrade #140

Open
wants to merge 49 commits into
base: develop
Choose a base branch
from
Open

Mongo7 upgrade #140

wants to merge 49 commits into from

Conversation

Xiangs18
Copy link
Collaborator

No description provided.

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 96.68874% with 5 lines in your changes missing coverage. Please review.

Project coverage is 80.81%. Comparing base (dd630ce) to head (0859dbe).

Files with missing lines Patch % Lines
lib/biokbase/catalog/db.py 96.68% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #140      +/-   ##
===========================================
+ Coverage    80.14%   80.81%   +0.67%     
===========================================
  Files            8        8              
  Lines         2695     2810     +115     
===========================================
+ Hits          2160     2271     +111     
- Misses         535      539       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sys.exit(1)

endpoints = []
catalog_mock_auth = ['auth_admin.json', 'auth_invalid.json', 'auth_missing.json', 'auth_non_admin.json']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to put a comment here explaining that the order is being preserved here in order to ensure the routes capture the requests in the correct order

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if a comment was added or not, but additional info about "Mock server servers matches routes in a specific order, this is the current order required for tests to pass"

self._mongo_client_initialized = False
self.mongo = None
self.db = None
self.index_created = defaultdict(int)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a bool?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used 1 for True and 0 for False.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so why not use a defaultdict(bool) instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Xiangs18
Copy link
Collaborator Author

Xiangs18 commented Feb 6, 2025

This PR does the following:

  1. Removed all submodules (jars, kbapi_common, nms).
  2. Added the NMS Docker service for local testing.
  3. Fixed a bug in mock_auth/servicer.py.
  4. Updated the catalog Docker build script.
  5. Implemented lazy loading for MongoDB.
  6. Cleaned up files in the repository.

@Xiangs18 Xiangs18 changed the title [WIP] mongo7 upgrade Mongo7 upgrade Feb 6, 2025
Copy link
Collaborator

@bio-boris bio-boris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pr description is empty, and would benefit from a copy of the changelog/release notes . The version and release notes should be updated as well

@@ -27,11 +27,6 @@ LIB_DIR = lib

PATH := kb_sdk/bin:$(PATH)

default: init

init:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this init section need to be removed or can we keep it in?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or was this deprecated in favor of docker compose, and was only needed for tests?

Copy link
Collaborator Author

@Xiangs18 Xiangs18 Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All submodules have been removed, so there is no need to pull them. If you want to keep init section, what should be included in the init?

@@ -102,8 +97,8 @@ setup-tests:
mkdir -p $(TESTLIB)/biokbase
mkdir -p $(TESTDIR)/nms
rsync -av lib/biokbase/* $(TESTLIB)/biokbase/. --exclude *.bak-*
rsync -av kbapi_common/lib/biokbase/* $(TESTLIB)/biokbase/.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need this anymore, can we delete it instead of commenting it out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

services:
mongo:
image: mongo:${{matrix.mongo-version}}
ports:
- 27017:27017
- 27018:27017
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep it at 27017?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

port 27017 is already being used by the NMS Mongo

Copy link
Member

@MrCreosote MrCreosote Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the NMS port in that case. If one or the other is going to have a non standard port it should be the supporting system, not the one we're working on

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need 2 seperate mongos?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One presumably starts up with the docker compose to support NMS, the other starts up as part of the catalog tests. That seems good to me, the catalog shouldn't know or care about any NMS dependencies


- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{matrix.python-version}}

- name: Install Docker Compose
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary. You can use docker compose as it is included in docker nowadays

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try docker compose without the hyphen

sys.exit(1)

endpoints = []
catalog_mock_auth = ['auth_admin.json', 'auth_invalid.json', 'auth_missing.json', 'auth_non_admin.json']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if a comment was added or not, but additional info about "Mock server servers matches routes in a specific order, this is the current order required for tests to pass"

# host where mongo lives, e.g. localhost:27017, for tests there should be not authentication required
mongodb-host = localhost:27017
# host where mongo lives, e.g. localhost:27018, for tests there should be not authentication required
mongodb-host = localhost:27018
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this at 27017?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

port 27017 is already being used by the NMS Mongo

unique=True, sparse=False)

elif collection_name == MongoCatalogDBI._LOCAL_FUNCTIONS:
# Make sure we have a unique index on module_name_lc and git_commit_hash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to refactor this giant elif block using a dictionary to map collection names to their respective index creation functions. Here is an example of how you can refactor it:

def _create_indexes(self, collection_name):
    """Create indexes for the given collection lazily."""
    collection = self.db[collection_name]

    index_creation_map = {
        MongoCatalogDBI._MODULE_VERSIONS: [
            ('module_name_lc', False),
            ('git_commit_hash', False),
            [('module_name_lc', ASCENDING), ('git_commit_hash', ASCENDING), True]
        ],
        MongoCatalogDBI._LOCAL_FUNCTIONS: [
            'function_id',
            [('module_name_lc', ASCENDING), ('function_id', ASCENDING), ('git_commit_hash', ASCENDING), True],
            'module_name_lc',
            'git_commit_hash'
        ],
        # Add other collections and their indexes similarly
    }

    indexes = index_creation_map.get(collection_name, [])
    for index in indexes:
        if isinstance(index, tuple):
            collection.create_index(index[0], sparse=index[1])
        elif isinstance(index, list):
            collection.create_index(index, unique=index[-1])
        else:
            collection.create_index(index)

This approach uses a dictionary to map collection names to a list of indexes, which reduces the need for multiple elif statements. Adjust the index_creation_map dictionary as needed to cover all your collections and their respective indexes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could potentially use an index spec to simplify as well: https://github.com/kbase/cdm-task-service/blob/main/cdmtaskservice/mongo.py#L70

return self._get_collection(MongoCatalogDBI._SECURE_CONFIG_PARAMS)

def _create_indexes(self, collection_name):
"""Create indexes for the given collection lazily."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to move out these into smaller functions into a file or class like db_indexes.py or something as this is an extremely long function. If possible, function's shouldn't be longer than the entire height of your screen.

@@ -1274,6 +1360,12 @@ def get_secure_config_params(self, module_name):
# another server is already starting an update, we can skip or abort
def check_db_schema(self):

if self._db_schema_checked:
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing coverage

@@ -1455,20 +1551,23 @@ def prepare_version_doc_for_db_2_to_3_update(self, version, module):
def update_db_3_to_4(self):

# make sure we don't have any indecies on the collections
self.volume_mounts.drop_indexes()
self.client_groups.drop_indexes()
volume_mounts_collection = self.db[MongoCatalogDBI._VOLUME_MOUNTS]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this entire function can be deleted, as we have already upgraded the indexes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK we're the only ones using the catalog, all these db version changes happened years ago, and so we can just put a note in the release notes that db version upgrades are no longer supported and delete all the upgrade methods

@@ -0,0 +1,47 @@
version: '3.4'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we re-name this file as "docker-compose?"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a compose file for the nms rather than the catalog, calling it as such seems appropriate to me

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a docker compose for testing purposes. By naming it "docker-compose" it allows you to start it up without specifying the path to a file. If we had multiple docker composes, I think it makes sense to use a non standard name, but since we only have one docker-compose file, we should use the standard name.

Copy link
Collaborator Author

@Xiangs18 Xiangs18 Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just for starting the NMS service for CI/local testing to run unit tests. It makes more sense to keep its name as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with -f <file> in this case but I don't feel that strongly about it

@bio-boris
Copy link
Collaborator

bio-boris commented Feb 6, 2025

This PR does the following:

1. Removed all submodules (jars, kbapi_common, nms). (Why though?)

2. Added the NMS Docker service for local testing. 

3. Fixed a bug in mock_auth/servicer.py. (Could you add more details to this?)

4. Updated the catalog Docker build script. (More details please)

5. Implemented lazy loading for MongoDB. (More details please)

6. Cleaned up files in the repository. (I don't know what this means)

This also upgrades pymongo

I think these can be added to the README.md

@@ -77,6 +81,10 @@ jobs:
sed -i "s#^nms-admin-token.*#nms-admin-token=$KBASE_CI_TOKEN#" test/test.cfg
sed -i "s#^method-spec-admin-users.*#method-spec-admin-users=$ADMIN_USER#" test/test.cfg

# setup env variables in docker-compose_nms.yml file
echo "ADMIN_USER=$ADMIN_USER"
echo "ADMIN_USER=$ADMIN_USER" >> $GITHUB_ENV
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment about why this env var is set up and where it's used? That should help the next person who has to work on this service understand what's going on

COPY requirements.txt requirements.txt
RUN source activate root && \
pip install -r requirements.txt
RUN pip install -r requirements.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should convert to pipenv / poetry / uv, but that's a later task

lib/biokbase/log.py Outdated Show resolved Hide resolved
NMS_PID=$!

echo 'Starting Mock Auth API...'
docker run -d --rm -v ${PWD}/mock_auth:/config -p 7777:5000 --name mock-auth mockservices/mock_json_service
docker run -d --rm -v ${PWD}/mock_auth:/config -v ${PWD}/mock_auth/server.py:/server/server.py -p 7777:5000 --name mock-auth mockservices/mock_json_service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't something we need to deal with now, but there are plenty of 3rd party service mocking options out there, I'm not sure why we rolled our own. Also auth2 has test mode which we could use


echo 'Waiting for NMS to start...'
sleep 25
curl -d '{"id":"1","params":[],"method":"NarrativeMethodStore.ver","version":"1.1"}' http://localhost:7125
curl -d '{"id":"1","params":[],"method":"NarrativeMethodStore.ver","version":"1.1"}' http://localhost:7125/rpc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow these tests must not've passed for a long time...

Copy link
Member

@MrCreosote MrCreosote Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs some explanation at the top. Why are we replacing the file in the mock server container? Whoever comes along next to work on the catalog will need to understand the reasoning behind this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changes have been made in this file compared to the original in the mock server container?

self.mongo_psswd = mongo_psswd
self.mongo_authMechanism = mongo_authMechanism

# MongoDB client and database initialization deferred
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explanation as to why this is necessary?

self.mongo_psswd = mongo_psswd
self.mongo_authMechanism = mongo_authMechanism

# MongoDB client and database initialization deferred
Copy link
Member

@MrCreosote MrCreosote Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How lazy is lazy? Is it not until a request is made, and therefore the service can be running in a guaranteed to fail state? If so, it seems like this class should at least test connectivity and auth at init time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the db class instance is created pre fork, and then the fork happens and it's no longer working. If that's the case, and if it's possible, what I would do is

  • in the init, check connectivity, check db schema, create indexes, etc., and close the client.
  • On the first call, create a mongo client knowing that everything is set up correctly, and set up the instance variables for the collections. Then you don't need all the property functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, also you probably need a thread lock around the client creation step, or multiple calls could result in multiple client creations

"""Initialize MongoDB client and collections lazily."""
if not self._mongo_client_initialized:
try:
if self.mongo_user and self.mongo_psswd:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note here that this case is only tested manually?

I assume it has been tested manually, yeah?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants