Add support for preloading models #822

alexnorell · 2024-11-20T21:15:32Z

Description

This PR introduces support for preloading models at startup and includes Kubernetes health check and readiness endpoints.

Key changes:

Added PRELOAD_MODELS environment variable to enable asynchronous preloading of specified models at server startup.
Implemented a /readiness endpoint for Kubernetes readiness probes to indicate when the server is ready to handle requests.
Added a /healthz endpoint for Kubernetes liveness probes to ensure the server is alive.
Updated http_api.py to handle model initialization with asynchronous tasks and readiness state tracking.
Update cpu and gpu builds:
- Allow for building and pushing to docker hub with custom tags
- move gpu build over to depot
- Create internal action to determine the list of tags to build for cpu and gpu builds
  - This can be rolled out to all other docker builds in the future.

Dependencies:

No new external dependencies added.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

Locally by setting the environment variables

For example:

Tested model preloading functionality with a mock PRELOAD_MODELS to simulate loading multiple models.
Verified Kubernetes readiness and liveness endpoints using curl and simulated Kubernetes probes.

Any specific deployment considerations

Ensure the PRELOAD_MODELS is properly configured with a comma-separated list of model IDs if preloading is required.
A valid API key must be stored in API_KEY
Update Kubernetes deployment manifests to include the new /readiness and /healthz probes.

Docs

Docs updated? What were the changes:
- Added information about the new environment variable PRELOAD_MODELS.
- Documented the readiness and health check endpoints for Kubernetes.

PawelPeczek-Roboflow

Seems to be a great extension, I do have one comment - the error in health seems to be non-recoverable one - once any of the model cannot be loaded (either temporally or by the virtue of actual problem) - service will never get better:

I am not sure if this is something that k8s by default would terminate after some time or maybe will fall in loops of reboots
also I am not 100% sure about the desired end in such scenario - what is the context of PR?

alexnorell · 2024-11-21T19:32:25Z

Seems to be a great extension, I do have one comment - the error in health seems to be non-recoverable one - once any of the model cannot be loaded (either temporally or by the virtue of actual problem) - service will never get better:

I am not sure if this is something that k8s by default would terminate after some time or maybe will fall in loops of reboots

also I am not 100% sure about the desired end in such scenario - what is the context of PR?

I've address some of the edge cases with this. The idea for these changes comes from the desire to preload models at startup rather than lazy loading in commonly used models.

For the K8s side:

/healthz should return positive status as soon as FastAPI is able to serve requests. We can add to this endpoint in the future to add additional states for health, but it is helpful to have at least something that lets us know the service is responding.
/readiness should return an error state until all the models at least have been attempted to be loaded. Once that has happened, K8s will start to move traffic to it. The intention with this change is to do a best effort initialization, but not prevent the service from running if it can't initialize.

alexnorell requested review from PawelPeczek-Roboflow, grzegorz-roboflow, yeldarby, probicheaux and hansent as code owners November 20, 2024 21:15

alexnorell requested review from bigbitbus and isaacrob-roboflow November 20, 2024 21:16

alexnorell marked this pull request as draft November 20, 2024 21:17

Add support for preloading models

356faf7

alexnorell force-pushed the feature/default_model_load branch from 469fbcc to 356faf7 Compare November 20, 2024 22:15

alexnorell added 3 commits November 20, 2024 16:26

Fix code quality

2dc9c4c

Make docker gpu build match the rest of the builds

7af0097

Allow for pushing without overwritting latest

9e4b9c2

alexnorell force-pushed the feature/default_model_load branch 3 times, most recently from 7a79231 to 52ad9d5 Compare November 21, 2024 01:56

Create internal action

3028769

alexnorell force-pushed the feature/default_model_load branch from 52ad9d5 to 3028769 Compare November 21, 2024 01:57

PawelPeczek-Roboflow reviewed Nov 21, 2024

View reviewed changes

Address some edge cases

f950fb8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for preloading models #822

Add support for preloading models #822

alexnorell commented Nov 20, 2024 •

edited

Loading

PawelPeczek-Roboflow left a comment •

edited

Loading

alexnorell commented Nov 21, 2024

Add support for preloading models #822

Are you sure you want to change the base?

Add support for preloading models #822

Conversation

alexnorell commented Nov 20, 2024 • edited Loading

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

PawelPeczek-Roboflow left a comment • edited Loading

Choose a reason for hiding this comment

alexnorell commented Nov 21, 2024

alexnorell commented Nov 20, 2024 •

edited

Loading

PawelPeczek-Roboflow left a comment •

edited

Loading