fix: sglang -- queue requests until model registration completes #2701

hhzhang16 · 2025-08-25T23:32:18Z

Overview:

Bugfix: add readiness gating to prevent race condition during sglang startup

This MR implements readiness gating to solve a race condition where the KV Scheduler selects workers before model registration completes. Now, the endpoint starts immediately and becomes discoverable but queues incoming requests until model registration succeeds, eliminating "instance_id not found" errors during startup.

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Refactor
- Decoupled service startup from model registration, launching the endpoint in the background with a brief stabilization delay.
- Enables concurrent registration during startup, reducing overall initialization time.
- Preserves existing error handling and cleanup flows.
Bug Fixes
- Improves startup reliability by avoiding race conditions during registration and endpoint initialization.

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

copy-pr-bot · 2025-08-25T23:32:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-08-25T23:40:42Z

Walkthrough

Decouples endpoint startup from model registration: the endpoint is launched as a background task, a 2.5-second delay is introduced, registration proceeds while the endpoint initializes, and the serve task is awaited afterward. Error handling and cleanup remain unchanged. No public API signatures are modified.

Changes

Cohort / File(s)	Summary of Changes
Endpoint startup and registration sequencing `components/backends/sglang/src/dynamo/sglang/main.py`	Start endpoint as background `serve_task`, sleep 2.5s, call `register_llm_with_runtime_config`, then await `serve_task`. Preserves existing error handling and cleanup; retains TODO about native endpoints. No exported signature changes.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Main as Main Orchestrator
    participant Endpoint as Endpoint Service
    participant Registry as Runtime Config / Registry

    rect rgb(235, 245, 255)
    Note over Main: Startup
    Main->>Endpoint: Spawn serve_task (background)
    Note over Main: Delay 2.5s to allow endpoint to initialize
    Main->>Registry: register_llm_with_runtime_config()
    end

    alt Success
        Main->>Endpoint: await serve_task
        Note over Main,Endpoint: Normal lifecycle continues
    else Error
        Note over Main: Log and re-raise
        Note over Main,Endpoint: Cleanup executed
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

Possibly related PRs

fix: Move register_llm_block down #2316 — Adjusts LLM registration timing to occur after service startup, similar sequencing change to this PR.

Poem

I hopped to the port, ears high with cheer,
Spun up the server, let it hum in the clear.
A pause—two beats and a whisker of time,
Then register the mind, in orderly rhyme.
Tasks await, logs neat—thump-thump, all set,
A rabbit’s rollout: best one yet. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f5a4100 and 601c014.

📒 Files selected for processing (1)

components/backends/sglang/src/dynamo/sglang/main.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

components/backends/sglang/src/dynamo/sglang/main.py (4)

components/backends/sglang/src/dynamo/sglang/request_handlers/decode_handler.py (1)

generate (56-94)

components/backends/sglang/src/dynamo/sglang/request_handlers/prefill_handler.py (1)

generate (49-70)

components/backends/sglang/src/dynamo/sglang/register.py (1)

register_llm_with_runtime_config (14-34)

lib/llm/src/local_model.rs (1)

migration_limit (150-153)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2701/merge) by hhzhang16.

components/backends/sglang/src/dynamo/sglang/main.py

[error] 90-96: Black formatting reformatted the file during pre-commit. Run 'black --write' to fix code style issues in this file.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (1)

components/backends/sglang/src/dynamo/sglang/main.py (1)

90-96: No Black formatting issues detected

The file components/backends/sglang/src/dynamo/sglang/main.py is already formatted according to the project’s pre-commit Black version (23.1.0). Running Black in check-and-diff mode reported no changes needed.

No further action required.

Likely an incorrect or invalid review comment.

components/backends/sglang/src/dynamo/sglang/main.py

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

dmitry-tokarev-nv · 2025-08-26T18:00:33Z

/ok to test c473a66

components/backends/sglang/src/dynamo/sglang/main.py

grahamking · 2025-08-26T18:14:23Z

Could you explain why in the description? Maybe because you need the health and/or readiness endpoints to be live immediately for Kubernetes.

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

components/backends/sglang/src/dynamo/sglang/main.py

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

grahamking

LGTM

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

…) (#2722) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: ayushag <ayushag@nvidia.com>

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: nnshah1 <neelays@nvidia.com>

fix: serve endpoint before model registration

601c014

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

pull-request-size bot added the size/S label Aug 25, 2025

hhzhang16 marked this pull request as ready for review August 25, 2025 23:32

hhzhang16 requested review from a team, GuanLuo, PeaBrane, alec-flowers, biswapanda, grahamking, ishandhanani, jthomson04, kkranen, nnshah1, paulhendricks, piotrm-nvidia, ptarasiewiczNV, rmccorm4, ryanolson, tanmayv25, tedzhouhk and tmonty12 as code owners August 25, 2025 23:32

github-actions bot added the fix label Aug 25, 2025

coderabbitai bot reviewed Aug 25, 2025

View reviewed changes

components/backends/sglang/src/dynamo/sglang/main.py Outdated Show resolved Hide resolved

components/backends/sglang/src/dynamo/sglang/main.py Outdated Show resolved Hide resolved

components/backends/sglang/src/dynamo/sglang/main.py Outdated Show resolved Hide resolved

feat: addressing CodeRabbit MR comments

fd1db00

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

pull-request-size bot added size/M and removed size/S labels Aug 26, 2025

feat: addressing CodeRabbit MR comments

c473a66

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB August 26, 2025 18:00 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 26, 2025 18:01 Inactive

grahamking reviewed Aug 26, 2025

View reviewed changes

components/backends/sglang/src/dynamo/sglang/main.py Outdated Show resolved Hide resolved

feat: asyncio bug fix, addressing hardcode concerns

38f99a6

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

Elnifio reviewed Aug 26, 2025

View reviewed changes

components/backends/sglang/src/dynamo/sglang/main.py Outdated Show resolved Hide resolved

components/backends/sglang/src/dynamo/sglang/main.py Outdated Show resolved Hide resolved

hhzhang16 marked this pull request as draft August 26, 2025 18:54

feat: queue requests until model registration completes

c4c81c2

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

hhzhang16 marked this pull request as ready for review August 26, 2025 20:12

hhzhang16 changed the title ~~fix: serve endpoint before model registration~~ fix: sglang -- queue requests until model registration completes Aug 26, 2025

grahamking approved these changes Aug 26, 2025

View reviewed changes

nv-nmailhot merged commit dfda620 into main Aug 26, 2025
13 checks passed

nv-nmailhot deleted the sglang-runtime-race-condition branch August 26, 2025 20:54

hhzhang16 added a commit that referenced this pull request Aug 26, 2025

fix: sglang -- queue requests until model registration completes (#2701)

3eb2f23

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

nv-nmailhot pushed a commit that referenced this pull request Aug 26, 2025

fix: sglang -- queue requests until model registration completes (#2701…

4973e95

…) (#2722) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

hhzhang16 added a commit that referenced this pull request Aug 27, 2025

fix: sglang -- queue requests until model registration completes (#2701)

06f9cbe

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

ayushag-nv pushed a commit that referenced this pull request Aug 27, 2025

fix: sglang -- queue requests until model registration completes (#2701)

6f8c9b0

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: ayushag <ayushag@nvidia.com>

jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025

fix: sglang -- queue requests until model registration completes (#2701)

bfd1fc7

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>

KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025

fix: sglang -- queue requests until model registration completes (#2701)

10ed720

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

nnshah1 pushed a commit that referenced this pull request Sep 8, 2025

fix: sglang -- queue requests until model registration completes (#2701)

dad51f6

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: nnshah1 <neelays@nvidia.com>

fix: sglang -- queue requests until model registration completes #2701

fix: sglang -- queue requests until model registration completes #2701

Uh oh!

Conversation

hhzhang16 commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 25, 2025

Uh oh!

coderabbitai bot commented Aug 25, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dmitry-tokarev-nv commented Aug 26, 2025

Uh oh!

Uh oh!

grahamking commented Aug 26, 2025

Uh oh!

Uh oh!

Uh oh!

grahamking left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hhzhang16 commented Aug 25, 2025 •

edited

Loading