Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade default ping endpoint #2231

Open
lxning opened this issue Apr 14, 2023 · 1 comment
Open

upgrade default ping endpoint #2231

lxning opened this issue Apr 14, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@lxning
Copy link
Collaborator

lxning commented Apr 14, 2023

🚀 The feature

on prod, ping endpoint is more useful if it can reflects each models healthy status. Upgrade the default ping behavior as following.

  • add parameter "maxRetryTimeoutInSec" (default: 5MIN) in model level config: the max maximum time window of recovering a dead backend worker.
  • a healthy worker can be in the state: WORKER_STARTED, WORKER_MODEL_LOADED, or WORKER_STOPPED within maxRetryTimeoutInMin window.
  • return 200 + message "healthy": for any model, the number of active workers is equal or larger than the configured minWorkers.
  • return 500 + message "unhealthy": for any model, the number of active workers is less than the configured minWorkers.

Motivation, pitch

existing ping endpoint only reflects server heartbeat. It always returns 200 with message such as "healthy", "Partial Healthy", or "Unhealthy"(see code). Here, "Partial Healthy" can be one of the scenarios:

  • case1 : one model has n (> 1) workers. m (< n) workers die.
  • case2: n models registered in a server. m (<n) model have partial or completely dead workers.

An inference request is routed to a "Partial Healthy", or "Unhealthy" server if load balancer is based on ping endpoint return code 200. In this case, the inference request will fail.

Alternatives

No response

Additional context

No response

@dhanmeetsingh
Copy link

  1. Verify that the custom model archive you're trying to load is valid and correctly formatted. You can use the PyTorch Hub package to verify the model archive by running the following command:
    python -m torch. hub.checkout_hash <MODEL_NAME> <MODEL_VERSION>

Replace <MODEL_NAME> and <MODEL_VERSION> with the name and version of your custom model. If the model archive is valid, the command should return a valid Git hash.

If the model archive is valid, check that the metadata and signature files are correctly formatted. You can use the
torch serve--show config command to view the configuration of your PyTorch Server instance, including the locations of the metadata and signature files. Ensure that these files exist and are correctly formatted.

If the metadata and signature files are correct, try updating to the latest version of PyTorch Serve to see if the issue has already been resolved. You can do this by running the following command:

pip install torchserve --upgrade

@lxning lxning self-assigned this Apr 20, 2023
@lxning lxning added the enhancement New feature or request label Apr 20, 2023
@lxning lxning added this to the v0.8.0 milestone Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants