-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for preloading models #822
base: main
Are you sure you want to change the base?
Conversation
469fbcc
to
356faf7
Compare
7a79231
to
52ad9d5
Compare
52ad9d5
to
3028769
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be a great extension, I do have one comment - the error in health seems to be non-recoverable one - once any of the model cannot be loaded (either temporally or by the virtue of actual problem) - service will never get better:
- I am not sure if this is something that k8s by default would terminate after some time or maybe will fall in loops of reboots
- also I am not 100% sure about the desired end in such scenario - what is the context of PR?
I've address some of the edge cases with this. The idea for these changes comes from the desire to preload models at startup rather than lazy loading in commonly used models. For the K8s side:
|
Description
This PR introduces support for preloading models at startup and includes Kubernetes health check and readiness endpoints.
Key changes:
Added
PRELOAD_MODELS
environment variable to enable asynchronous preloading of specified models at server startup.Implemented a
/readiness
endpoint for Kubernetes readiness probes to indicate when the server is ready to handle requests.Added a
/healthz
endpoint for Kubernetes liveness probes to ensure the server is alive.Updated
http_api.py
to handle model initialization with asynchronous tasks and readiness state tracking.Update cpu and gpu builds:
Dependencies:
Type of change
Please delete options that are not relevant.
How has this change been tested, please provide a testcase or example of how you tested the change?
Locally by setting the environment variables
For example:
PRELOAD_MODELS
to simulate loading multiple models.curl
and simulated Kubernetes probes.Any specific deployment considerations
PRELOAD_MODELS
is properly configured with a comma-separated list of model IDs if preloading is required./readiness
and/healthz
probes.Docs
PRELOAD_MODELS
.