-
Notifications
You must be signed in to change notification settings - Fork 690
test: Request Migration Docs and E2E vLLM Tests #2177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
66bf853 to
3bd3eb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
components/backends/vllm/src/dynamo/vllm/handlers.py (1)
116-123: PossibleKeyErrorwhen optional fields are missing
request["sampling_options"]andrequest["stop_conditions"]are accessed with
[], which will raise if the caller omits either key.
If the contract allows them to be absent, switch to.get()with a default
empty dict.-for key, value in request["sampling_options"].items(): +for key, value in request.get("sampling_options", {}).items(): @@ -for key, value in request["stop_conditions"].items(): +for key, value in request.get("stop_conditions", {}).items():
🧹 Nitpick comments (4)
components/backends/vllm/src/dynamo/vllm/handlers.py (2)
108-110: Make the log line self-describingWhen scanning large logs it helps to identify the emitting handler immediately.
Recommend prefixing the message with the class name (ordecode) to avoid
confusion with the prefill handler.-logger.debug(f"New Request ID: {request_id}") +logger.debug(f"DecodeWorkerHandler – new request_id=%s", request_id)
167-169: Consistent log prefix for Prefill handlerMatch the decode handler style to keep grep-ability consistent.
-logger.debug(f"New Prefill Request ID: {request_id}") +logger.debug(f"PrefillWorkerHandler – request_id=%s", request_id)docs/architecture/request_migration.md (1)
21-29: Clarify default behaviour & flag precedenceThe bullet “Default behavior: no migration allowed” can confuse readers because
dynamo-run --migration-limitdefaults to 0, which already disables
migration. Consider re-phrasing to emphasise that omitting the flag or setting
it to0are equivalent, and that engine-specific overrides (if any) take
precedence over the CLI flag.tests/fault_tolerance/test_request_migration.py (1)
403-408: Consider making the wait time configurable.The 0.5-second wait before killing the primary worker is hardcoded. Consider making this configurable or adding a comment explaining why this specific timing was chosen.
- # Step 6: Wait 0.5 seconds after sending the formal request, then kill the primary worker + # Step 6: Wait briefly to ensure request processing has started, then kill the primary worker + # 0.5 seconds is sufficient for the request to be received and processing to begin logger.info( f"Killing {primary_worker[1]} with PID {primary_worker[0].get_pid()}" ) time.sleep(0.5)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
components/backends/llama_cpp/README.md(1 hunks)components/backends/sglang/README.md(1 hunks)components/backends/trtllm/README.md(1 hunks)components/backends/vllm/README.md(1 hunks)components/backends/vllm/src/dynamo/vllm/handlers.py(2 hunks)docs/architecture/request_migration.md(1 hunks)docs/guides/backend.md(1 hunks)docs/guides/dynamo_run.md(1 hunks)tests/fault_tolerance/README.md(1 hunks)tests/fault_tolerance/test_request_migration.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
components/backends/vllm/src/dynamo/vllm/handlers.py (1)
Learnt from: ishandhanani
PR: #1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the sampling_options call in the preprocess_request method is placed in the common section after the match statement on request.prompt_input_type(), meaning it applies to both PromptInput::Tokens and PromptInput::Text request types.
tests/fault_tolerance/README.md (1)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/scenarios.py:57-57
Timestamp: 2025-07-01T15:39:56.789Z
Learning: The fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted container environment, so hardcoded absolute paths with /workspace/ prefix are intentional and should not be changed to relative paths.
🧬 Code Graph Analysis (1)
components/backends/vllm/src/dynamo/vllm/handlers.py (1)
tests/conftest.py (1)
logger(98-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (8)
docs/guides/backend.md (1)
70-75: Nice addition – option documented clearlyThe new
migration_limitkwarg is documented concisely and links to the
architecture doc. Looks good.components/backends/trtllm/README.md (1)
208-217: Documentation update reads wellThe slimmer explanation plus link to the central doc keeps this README tidy.
No further comments.components/backends/sglang/README.md (1)
142-151: README streamlined correctlySame remark as for trtllm – reduction in duplicated text and a clear link to
the architecture doc is an improvement.components/backends/llama_cpp/README.md (1)
9-15: LGTM! Good documentation refactoring.The changes appropriately centralize the request migration documentation by linking to the architecture document, reducing duplication across backend READMEs.
components/backends/vllm/README.md (1)
192-198: LGTM! Consistent documentation structure.The changes align with the documentation refactoring pattern, maintaining consistency across backend READMEs.
tests/fault_tolerance/README.md (1)
1-81: Excellent documentation for the fault tolerance test suite!The README provides comprehensive coverage of test workflow, prerequisites, execution methods, and troubleshooting steps. The use of DeepSeek-R1-Distill-Llama-8B (which doesn't require an API key) is a good choice for automated testing.
docs/guides/dynamo_run.md (1)
214-220: LGTM! Documentation consistency maintained.The Request Migration section appropriately references the architecture documentation, maintaining consistency with the backend README updates.
tests/fault_tolerance/test_request_migration.py (1)
346-421: Well-structured end-to-end test for request migration!The test comprehensively validates the request migration functionality with proper setup, execution, and validation steps. The sequential worker startup ensures predictable round-robin behavior, and the use of context managers ensures proper cleanup.
nnshah1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few nits - we can add follow up stories to reuse utilities for process termination / augment the managed process to handle logs / move to using the new ready endpoints for the workers (will require setting them up via environment variables)
Co-authored-by: Neelay Shah <neelays@nvidia.com> Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Overview:
Add end-to-end testing for Request Migration, and refactor documentations on Request Migration.
Details:
Where should the reviewer start?
Start with the documentation changes first, and then look into the e2d testing.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
N/A
Summary by CodeRabbit
Documentation
New Features
Tests
Style