You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
on prod, ping endpoint is more useful if it can reflects each models healthy status. Upgrade the default ping behavior as following.
add parameter "maxRetryTimeoutInSec" (default: 5MIN) in model level config: the max maximum time window of recovering a dead backend worker.
a healthy worker can be in the state: WORKER_STARTED, WORKER_MODEL_LOADED, or WORKER_STOPPED within maxRetryTimeoutInMin window.
return 200 + message "healthy": for any model, the number of active workers is equal or larger than the configured minWorkers.
return 500 + message "unhealthy": for any model, the number of active workers is less than the configured minWorkers.
Motivation, pitch
existing ping endpoint only reflects server heartbeat. It always returns 200 with message such as "healthy", "Partial Healthy", or "Unhealthy"(see code). Here, "Partial Healthy" can be one of the scenarios:
case1 : one model has n (> 1) workers. m (< n) workers die.
case2: n models registered in a server. m (<n) model have partial or completely dead workers.
An inference request is routed to a "Partial Healthy", or "Unhealthy" server if load balancer is based on ping endpoint return code 200. In this case, the inference request will fail.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Verify that the custom model archive you're trying to load is valid and correctly formatted. You can use the PyTorch Hub package to verify the model archive by running the following command: python -m torch. hub.checkout_hash <MODEL_NAME> <MODEL_VERSION>
Replace <MODEL_NAME> and <MODEL_VERSION> with the name and version of your custom model. If the model archive is valid, the command should return a valid Git hash.
If the model archive is valid, check that the metadata and signature files are correctly formatted. You can use the torch serve--show config command to view the configuration of your PyTorch Server instance, including the locations of the metadata and signature files. Ensure that these files exist and are correctly formatted.
If the metadata and signature files are correct, try updating to the latest version of PyTorch Serve to see if the issue has already been resolved. You can do this by running the following command:
🚀 The feature
on prod, ping endpoint is more useful if it can reflects each models healthy status. Upgrade the default ping behavior as following.
Motivation, pitch
existing ping endpoint only reflects server heartbeat. It always returns 200 with message such as "healthy", "Partial Healthy", or "Unhealthy"(see code). Here, "Partial Healthy" can be one of the scenarios:
An inference request is routed to a "Partial Healthy", or "Unhealthy" server if load balancer is based on ping endpoint return code 200. In this case, the inference request will fail.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: