triton-inference-server · dyastremsky · Oct 26, 2023 · Jan 16, 2023 · Oct 26, 2023 · Oct 26, 2023
diff --git a/deploy/k8s-onprem/README.md b/deploy/k8s-onprem/README.md
@@ -234,6 +234,15 @@ EOF
 $ helm install example -f config.yaml .
 ```
 
+## Probe Configuration
+
+In `templates/deployment.yaml` is configurations for `livenessProbe`, `readinessProbe` and `startupProbe` for the Triton server container. 
+By default, Triton loads all the models before starting the HTTP server to respond to the probes. The process can take several minutes, depending on the models sizes. 
+If it is not completed in `startupProbe.failureThreshold * startupProbe.periodSeconds` seconds then Kubernetes considers this as a pod failure and restarts it, ending up with an infinite loop of restarting pods, so make sure to sufficiently set these values for your use case.
+The liveliness and readiness probes are being sent only after the first success of a startup probe. 
+
+For more details, see the [Kubernetes probe documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) and the [feature page of the startup probe](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/950-liveness-probe-holdoff/README.md).
+
 ## Using Triton Inference Server
 
 Now that the inference server is running you can send HTTP or GRPC

diff --git a/deploy/k8s-onprem/templates/deployment.yaml b/deploy/k8s-onprem/templates/deployment.yaml
@@ -79,12 +79,25 @@ spec:
             - containerPort: 8002
               name: metrics
           livenessProbe:
+            initialDelaySeconds: 15
+            failureThreshold: 3
+            periodSeconds: 10
             httpGet:
               path: /v2/health/live
               port: http
           readinessProbe:
             initialDelaySeconds: 5
             periodSeconds: 5
+            failureThreshold: 3
+            httpGet:
+              path: /v2/health/ready
+              port: http
+          startupProbe:
+            # allows Triton to load the models during 30*10 = 300 sec = 5 min
+            # starts checking the other probes only after the success of this one
+            # for details, see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes
+            periodSeconds: 10
+            failureThreshold: 30
             httpGet:
               path: /v2/health/ready
               port: http