-
Notifications
You must be signed in to change notification settings - Fork 690
fix: sglang -- queue requests until model registration completes #2701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
WalkthroughDecouples endpoint startup from model registration: the endpoint is launched as a background task, a 2.5-second delay is introduced, registration proceeds while the endpoint initializes, and the serve task is awaited afterward. Error handling and cleanup remain unchanged. No public API signatures are modified. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Main as Main Orchestrator
participant Endpoint as Endpoint Service
participant Registry as Runtime Config / Registry
rect rgb(235, 245, 255)
Note over Main: Startup
Main->>Endpoint: Spawn serve_task (background)
Note over Main: Delay 2.5s to allow endpoint to initialize
Main->>Registry: register_llm_with_runtime_config()
end
alt Success
Main->>Endpoint: await serve_task
Note over Main,Endpoint: Normal lifecycle continues
else Error
Note over Main: Log and re-raise
Note over Main,Endpoint: Cleanup executed
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
components/backends/sglang/src/dynamo/sglang/main.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
components/backends/sglang/src/dynamo/sglang/main.py (4)
components/backends/sglang/src/dynamo/sglang/request_handlers/decode_handler.py (1)
generate(56-94)components/backends/sglang/src/dynamo/sglang/request_handlers/prefill_handler.py (1)
generate(49-70)components/backends/sglang/src/dynamo/sglang/register.py (1)
register_llm_with_runtime_config(14-34)lib/llm/src/local_model.rs (1)
migration_limit(150-153)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2701/merge) by hhzhang16.
components/backends/sglang/src/dynamo/sglang/main.py
[error] 90-96: Black formatting reformatted the file during pre-commit. Run 'black --write' to fix code style issues in this file.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
components/backends/sglang/src/dynamo/sglang/main.py (1)
90-96: No Black formatting issues detectedThe file
components/backends/sglang/src/dynamo/sglang/main.pyis already formatted according to the project’s pre-commit Black version (23.1.0). Running Black in check-and-diff mode reported no changes needed.No further action required.
Likely an incorrect or invalid review comment.
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
|
/ok to test c473a66 |
|
Could you explain why in the description? Maybe because you need the health and/or readiness endpoints to be live immediately for Kubernetes. |
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
grahamking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
…) (#2722) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
Overview:
Bugfix: add readiness gating to prevent race condition during sglang startup
This MR implements readiness gating to solve a race condition where the KV Scheduler selects workers before model registration completes. Now, the endpoint starts immediately and becomes discoverable but queues incoming requests until model registration succeeds, eliminating "instance_id not found" errors during startup.
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Refactor
Bug Fixes