-
Notifications
You must be signed in to change notification settings - Fork 690
fix: Adjust frontEnd thresholds #2288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughLiveness probe configurations for various backend deployment YAML files were adjusted to increase the frequency and decrease the delay, timeout, and failure threshold for health checks. These changes affect multiple services across both Changes
Sequence Diagram(s)sequenceDiagram
participant Kubelet
participant Frontend
participant VllmDecodeWorker
Note over Kubelet: Liveness probe (more frequent, faster timeout)
Kubelet->>Frontend: HTTP GET /healthz (every 5s, timeout 5s)
Frontend-->>Kubelet: 200 OK or error
Note over Kubelet: For VllmDecodeWorker (agg.yaml)
Kubelet->>VllmDecodeWorker: HTTP GET /healthz (period 10s, timeout 5s)
VllmDecodeWorker-->>Kubelet: 200 OK or error
Note over Kubelet: Fewer failures tolerated before restart
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (4)
components/backends/vllm/deploy/disagg_planner.yaml (1)
25-28: Same liveness / readiness mismatch as noted insglang-agg– please aligninitialDelaySecondsor use a dedicated live endpoint.
See previous comment for details.components/backends/vllm/deploy/disagg_router.yaml (1)
18-21: Same liveness / readiness mismatch as noted insglang-agg– adjust delay or endpoint.components/backends/sglang/deploy/agg_router.yaml (1)
15-18: Same liveness / readiness mismatch as noted insglang-agg– adjust delay or endpoint.components/backends/vllm/deploy/disagg.yaml (1)
18-21: Same liveness / readiness mismatch as noted insglang-agg– adjust delay or endpoint.
🧹 Nitpick comments (1)
components/backends/vllm/deploy/agg.yaml (1)
25-28: Readiness probe frequency equal to liveness – re-evaluate necessityRunning the expensive
curl | jqcommand every 5 s per replica adds overhead and log noise without tangible benefit.
Typical patterns keep readiness at ≥10 s once the pod is ready.- periodSeconds: 5 - failureThreshold: 3 + periodSeconds: 15 + failureThreshold: 5
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
components/backends/sglang/deploy/agg.yaml(1 hunks)components/backends/sglang/deploy/agg_router.yaml(1 hunks)components/backends/sglang/deploy/disagg.yaml(1 hunks)components/backends/vllm/deploy/agg.yaml(3 hunks)components/backends/vllm/deploy/agg_router.yaml(1 hunks)components/backends/vllm/deploy/disagg.yaml(1 hunks)components/backends/vllm/deploy/disagg_planner.yaml(1 hunks)components/backends/vllm/deploy/disagg_router.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: nnshah1
PR: ai-dynamo/dynamo#2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1890
File: examples/vllm/deploy/agg.yaml:63-70
Timestamp: 2025-07-14T23:01:16.218Z
Learning: In vLLM worker deployments, grep-based log checks for "VllmWorker.*has been initialized" are appropriate for readiness probes to verify worker startup, but should not be used for liveness probes which need to detect ongoing worker health.
📚 Learning: in vllm worker deployments, startup probes (with longer periods and higher failure thresholds like p...
Learnt from: nnshah1
PR: ai-dynamo/dynamo#2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.
Applied to files:
components/backends/sglang/deploy/agg_router.yamlcomponents/backends/vllm/deploy/disagg_planner.yamlcomponents/backends/vllm/deploy/disagg.yamlcomponents/backends/sglang/deploy/agg.yamlcomponents/backends/sglang/deploy/disagg.yamlcomponents/backends/vllm/deploy/disagg_router.yamlcomponents/backends/vllm/deploy/agg_router.yamlcomponents/backends/vllm/deploy/agg.yaml
📚 Learning: in components/backends/sglang/deploy/agg_router.yaml, the clear_namespace command is intentionally d...
Learnt from: biswapanda
PR: ai-dynamo/dynamo#2137
File: components/backends/sglang/deploy/agg_router.yaml:0-0
Timestamp: 2025-07-28T17:00:07.968Z
Learning: In components/backends/sglang/deploy/agg_router.yaml, the clear_namespace command is intentionally designed to block the router from starting if it fails (using &&). This is a deliberate design decision where namespace clearing is a critical prerequisite and the router should not start with an uncleared namespace.
Applied to files:
components/backends/sglang/deploy/agg_router.yamlcomponents/backends/vllm/deploy/disagg_router.yamlcomponents/backends/vllm/deploy/agg_router.yaml
📚 Learning: in vllm worker deployments, grep-based log checks for "vllmworker.*has been initialized" are appropr...
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1890
File: examples/vllm/deploy/agg.yaml:63-70
Timestamp: 2025-07-14T23:01:16.218Z
Learning: In vLLM worker deployments, grep-based log checks for "VllmWorker.*has been initialized" are appropriate for readiness probes to verify worker startup, but should not be used for liveness probes which need to detect ongoing worker health.
Applied to files:
components/backends/vllm/deploy/disagg_planner.yamlcomponents/backends/vllm/deploy/disagg.yamlcomponents/backends/vllm/deploy/disagg_router.yamlcomponents/backends/vllm/deploy/agg_router.yamlcomponents/backends/vllm/deploy/agg.yaml
📚 Learning: the stopsignal field under lifecycle in dynamocomponentdeployment crds is autogenerated due to kuber...
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Applied to files:
components/backends/sglang/deploy/disagg.yamlcomponents/backends/vllm/deploy/agg.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (3)
components/backends/sglang/deploy/disagg.yaml (1)
15-18: Possible premature restarts of the Frontend containerThe new probe kills the pod after three 5-second failures (≈15 s). In GPU-heavy clusters the container often competes for CPU at boot; we’ve previously left 60 s before the first probe to avoid churn.
Please verify average start-up time in staging and, if necessary, protect with a
startupProbeor raisefailureThresholdback to ≥5.components/backends/vllm/deploy/agg.yaml (2)
62-65: Timeout lowered to 5 s – ensure /health can answer under GPU loadThe readiness endpoint occasionally runs model warm-up logic; confirm sub-5 s response under max-batch test or keep the former 30 s timeout.
91-94: Startup probe tweak LGTMAdding
initialDelaySecondsand reducing timeout is consistent with other deployments and keeps the probe lightweight.
biswapanda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
Overview:
More reasonable values to avoid customer confusion (https://nvbugspro.nvidia.com/bug/5425651)
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit