Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StartupProbe for K8s-onprem added and documented #5257

Merged
merged 3 commits into from
Oct 26, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions deploy/k8s-onprem/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,15 @@ EOF
$ helm install example -f config.yaml .
```

## Probe Configuration

In `templates/deployment.yaml` is configurations for `livenessProbe`, `readinessProbe` and `startupProbe` for the Triton server container.
By default, Triton loads all the models before starting the HTTP server to respond to the probes. The process can take several minutes, depending on the models sizes.
If it is not completed in `startupProbe.failureThreshold * startupProbe.periodSeconds` seconds then Kubernetes considers this as a pod failure and restarts it, ending up with an infinite loop of restarting pods, so make sure to sufficiently set these values for your use case.
The liveliness and readiness probes are being sent only after the first success of a startup probe.

For more details, see the [Kubernetes probe documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) and the [feature page of the startup probe](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/950-liveness-probe-holdoff/README.md).

## Using Triton Inference Server

Now that the inference server is running you can send HTTP or GRPC
Expand Down
13 changes: 13 additions & 0 deletions deploy/k8s-onprem/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,25 @@ spec:
- containerPort: 8002
name: metrics
livenessProbe:
initialDelaySeconds: 15
failureThreshold: 3
periodSeconds: 10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are already the defaults. Adding them verbose, so that people could customize the values easier.

httpGet:
path: /v2/health/live
port: http
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
httpGet:
path: /v2/health/ready
port: http
startupProbe:
# allows Triton to load the models during 30*10 = 300 sec = 5 min
# starts checking the other probes only after the success of this one
# for details, see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes
periodSeconds: 10
failureThreshold: 30
httpGet:
path: /v2/health/ready
port: http
Expand Down
Loading