diff --git a/deploy/k8s-onprem/README.md b/deploy/k8s-onprem/README.md index fcc48b1028..962ad4e40b 100644 --- a/deploy/k8s-onprem/README.md +++ b/deploy/k8s-onprem/README.md @@ -234,6 +234,16 @@ EOF $ helm install example -f config.yaml . ``` +## Probe Configuration + +In `templates/deployment.yaml` is configurations for `livenessProbe`, `readinessProbe` and `startupProbe` for the Triton server container. +By default, Triton loads all the models before starting the HTTP server to respond to the probes. The process can take several minutes, depending on the models sizes. +If it is not completed in `startupProbe.failureThreshold * startupProbe.periodSeconds` seconds then Kubernetes considers this as a pod failure and restarts it, +ending up with an infinite loop of restarting pods, so make sure to sufficiently set these values for your use case. +The liveliness and readiness probes are being sent only after the first success of a startup probe. + +For more details, see the [Kubernetes probe documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) and the [feature page of the startup probe](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/950-liveness-probe-holdoff/README.md). + ## Using Triton Inference Server Now that the inference server is running you can send HTTP or GRPC @@ -314,4 +324,4 @@ CRDs](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube- ``` $ kubectl delete crd alertmanagerconfigs.monitoring.coreos.com alertmanagers.monitoring.coreos.com podmonitors.monitoring.coreos.com probes.monitoring.coreos.com prometheuses.monitoring.coreos.com prometheusrules.monitoring.coreos.com servicemonitors.monitoring.coreos.com thanosrulers.monitoring.coreos.com -``` \ No newline at end of file +``` diff --git a/deploy/k8s-onprem/templates/deployment.yaml b/deploy/k8s-onprem/templates/deployment.yaml index 6945cce23a..fa521d28cb 100644 --- a/deploy/k8s-onprem/templates/deployment.yaml +++ b/deploy/k8s-onprem/templates/deployment.yaml @@ -79,12 +79,25 @@ spec: - containerPort: 8002 name: metrics livenessProbe: + initialDelaySeconds: 15 + failureThreshold: 3 + periodSeconds: 10 httpGet: path: /v2/health/live port: http readinessProbe: initialDelaySeconds: 5 periodSeconds: 5 + failureThreshold: 3 + httpGet: + path: /v2/health/ready + port: http + startupProbe: + # allows Triton to load the models during 30*10 = 300 sec = 5 min + # starts checking the other probes only after the success of this one + # for details, see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes + periodSeconds: 10 + failureThreshold: 30 httpGet: path: /v2/health/ready port: http