test: Request Migration Docs and E2E vLLM Tests #2177

kthui · 2025-07-29T22:46:17Z

Overview:

Add end-to-end testing for Request Migration, and refactor documentations on Request Migration.

Details:

Add vLLM end-to-end tests.
Add architecture documentation on Request Migration.
Update components/backends docs to only focus on setting flags, and points to the new Request Migration docs.
Update dynamo-run docs to only focus on setting flags, and points to the new Request Migration docs.

Where should the reviewer start?

Start with the documentation changes first, and then look into the e2d testing.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

Summary by CodeRabbit

Documentation
- Simplified and clarified request migration documentation across multiple backend and guide files.
- Added a comprehensive architecture document detailing the request migration system, configuration, and failure handling.
- Updated backend and guide documentation to reference the new architecture document and describe migration-related options more concisely.
- Added a README for fault tolerance end-to-end tests, including setup and troubleshooting instructions.
New Features
- Introduced an end-to-end fault tolerance test verifying request migration and recovery from worker failures.
Tests
- Added a new test module for automated validation of request migration, including process management, model download, and migration verification.
Style
- Added debug logging for improved traceability of request IDs in backend handlers.

copy-pr-bot · 2025-07-29T22:46:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ishandhanani · 2025-07-29T23:25:37Z

@kthui mentioned that we already have this functionality in SGL as well. @yinggeh to do for SGL as well

tests/fault_tolerance/README.md

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

components/backends/vllm/src/dynamo/vllm/handlers.py (1)
116-123: Possible KeyError when optional fields are missing

request["sampling_options"] and request["stop_conditions"] are accessed with
[], which will raise if the caller omits either key.
If the contract allows them to be absent, switch to .get() with a default
empty dict.
-for key, value in request["sampling_options"].items():
+for key, value in request.get("sampling_options", {}).items():
@@
-for key, value in request["stop_conditions"].items():
+for key, value in request.get("stop_conditions", {}).items():

🧹 Nitpick comments (4)

components/backends/vllm/src/dynamo/vllm/handlers.py (2)
108-110: Make the log line self-describing

When scanning large logs it helps to identify the emitting handler immediately.
Recommend prefixing the message with the class name (or decode) to avoid
confusion with the prefill handler.
-logger.debug(f"New Request ID: {request_id}")
+logger.debug(f"DecodeWorkerHandler – new request_id=%s", request_id)
167-169: Consistent log prefix for Prefill handler

Match the decode handler style to keep grep-ability consistent.
-logger.debug(f"New Prefill Request ID: {request_id}")
+logger.debug(f"PrefillWorkerHandler – request_id=%s", request_id)
docs/architecture/request_migration.md (1)

21-29: Clarify default behaviour & flag precedence

The bullet “Default behavior: no migration allowed” can confuse readers because
dynamo-run --migration-limit defaults to 0, which already disables
migration. Consider re-phrasing to emphasise that omitting the flag or setting
it to 0 are equivalent, and that engine-specific overrides (if any) take
precedence over the CLI flag.
tests/fault_tolerance/test_request_migration.py (1)
403-408: Consider making the wait time configurable.

The 0.5-second wait before killing the primary worker is hardcoded. Consider making this configurable or adding a comment explaining why this specific timing was chosen.
-                # Step 6: Wait 0.5 seconds after sending the formal request, then kill the primary worker
+                # Step 6: Wait briefly to ensure request processing has started, then kill the primary worker
+                # 0.5 seconds is sufficient for the request to be received and processing to begin
                 logger.info(
                     f"Killing {primary_worker[1]} with PID {primary_worker[0].get_pid()}"
                 )
                 time.sleep(0.5)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee09de0 and 3bd3eb3.

📒 Files selected for processing (10)

components/backends/llama_cpp/README.md (1 hunks)
components/backends/sglang/README.md (1 hunks)
components/backends/trtllm/README.md (1 hunks)
components/backends/vllm/README.md (1 hunks)
components/backends/vllm/src/dynamo/vllm/handlers.py (2 hunks)
docs/architecture/request_migration.md (1 hunks)
docs/guides/backend.md (1 hunks)
docs/guides/dynamo_run.md (1 hunks)
tests/fault_tolerance/README.md (1 hunks)
tests/fault_tolerance/test_request_migration.py (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

components/backends/vllm/src/dynamo/vllm/handlers.py (1)

Learnt from: ishandhanani
PR: #1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the sampling_options call in the preprocess_request method is placed in the common section after the match statement on request.prompt_input_type(), meaning it applies to both PromptInput::Tokens and PromptInput::Text request types.

tests/fault_tolerance/README.md (1)

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/scenarios.py:57-57
Timestamp: 2025-07-01T15:39:56.789Z
Learning: The fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted container environment, so hardcoded absolute paths with /workspace/ prefix are intentional and should not be changed to relative paths.

🧬 Code Graph Analysis (1)

components/backends/vllm/src/dynamo/vllm/handlers.py (1)

tests/conftest.py (1)

logger (98-109)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Build and Test - vllm
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (8)

docs/guides/backend.md (1)

70-75: Nice addition – option documented clearly

The new migration_limit kwarg is documented concisely and links to the
architecture doc. Looks good.

components/backends/trtllm/README.md (1)

208-217: Documentation update reads well

The slimmer explanation plus link to the central doc keeps this README tidy.
No further comments.

components/backends/sglang/README.md (1)

142-151: README streamlined correctly

Same remark as for trtllm – reduction in duplicated text and a clear link to
the architecture doc is an improvement.

components/backends/llama_cpp/README.md (1)

9-15: LGTM! Good documentation refactoring.

The changes appropriately centralize the request migration documentation by linking to the architecture document, reducing duplication across backend READMEs.

components/backends/vllm/README.md (1)

192-198: LGTM! Consistent documentation structure.

The changes align with the documentation refactoring pattern, maintaining consistency across backend READMEs.

tests/fault_tolerance/README.md (1)

1-81: Excellent documentation for the fault tolerance test suite!

The README provides comprehensive coverage of test workflow, prerequisites, execution methods, and troubleshooting steps. The use of DeepSeek-R1-Distill-Llama-8B (which doesn't require an API key) is a good choice for automated testing.

docs/guides/dynamo_run.md (1)

214-220: LGTM! Documentation consistency maintained.

The Request Migration section appropriately references the architecture documentation, maintaining consistency with the backend README updates.

tests/fault_tolerance/test_request_migration.py (1)

346-421: Well-structured end-to-end test for request migration!

The test comprehensively validates the request migration functionality with proper setup, execution, and validation steps. The sequential worker startup ensures predictable round-robin behavior, and the use of context managers ensures proper cleanup.

docs/architecture/request_migration.md

tests/fault_tolerance/test_request_migration.py

nnshah1

a few nits - we can add follow up stories to reuse utilities for process termination / augment the managed process to handle logs / move to using the new ready endpoints for the workers (will require setting them up via environment variables)

Co-authored-by: Neelay Shah <neelays@nvidia.com> Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

kthui added 2 commits July 29, 2025 15:01

Add debug log to vLLM generate

31d3b18

test: Add e2e migration test for vllm

763647f

pull-request-size bot added the size/L label Jul 29, 2025

kthui self-assigned this Jul 29, 2025

github-actions bot added the test label Jul 29, 2025

docs: Add Request Migration architecture docs

9022010

pull-request-size bot added size/XL and removed size/L labels Jul 30, 2025

ishandhanani reviewed Jul 30, 2025

View reviewed changes

tests/fault_tolerance/README.md Outdated Show resolved Hide resolved

kthui added 2 commits July 29, 2025 21:18

test: Separate model download from backend start

610b8a4

docs: Update e2e test readme

3bd3eb3

kthui force-pushed the jacky-ft-migrate-test branch from 66bf853 to 3bd3eb3 Compare July 30, 2025 16:19

kthui changed the title ~~test: Add e2e testing for Request Migration~~ test: Request Migration E2E vLLM Tests and Docs Jul 30, 2025

kthui marked this pull request as ready for review July 30, 2025 16:25

kthui requested review from GuanLuo, alec-flowers, biswapanda, grahamking, kkranen, nnshah1, paulhendricks, piotrm-nvidia, ptarasiewiczNV, rmccorm4, ryanolson, tanmayv25, tedzhouhk and tmonty12 as code owners July 30, 2025 16:25

coderabbitai bot reviewed Jul 30, 2025

View reviewed changes

kthui changed the title ~~test: Request Migration E2E vLLM Tests and Docs~~ test: Request Migration Docs and E2E vLLM Tests Jul 30, 2025

kthui enabled auto-merge (squash) July 30, 2025 18:04

kthui requested a review from ishandhanani July 30, 2025 18:05