Description
Version
>= 0.23
Description
When an API is failing getting started (due to an exception in the predictor's constructor), the deployment enters a restart loop (as described by the API's deployment spec), but the API's state doesn't change in cortex get
(or in the Python Client). The API's state is stuck to updating
as opposed to switching to error
or to anything else as expected.
Steps to reproduce
Take an iris classifier test example and add a raise in the constructor. Deploy that using any cloud provider (AWS or GCP). Then check cortex get
and notice how the API's state doesn't change from updating
to error
.
Solution
When creating a stage 2 service with s6, if a service exits with a non-zero exit code, before sending the kill signal to all other services, export the non-zero exit code to stage 3 like in this example.
... redirfd -w 1 /var/run/s6/env-stage3/S6_STAGE2_EXITED s6-echo -n -- \${1} ...