Skip to content

Conversation

@kthui
Copy link
Contributor

@kthui kthui commented Jul 29, 2025

Overview:

Add end-to-end testing for Request Migration, and refactor documentations on Request Migration.

Details:

  • Add vLLM end-to-end tests.
  • Add architecture documentation on Request Migration.
  • Update components/backends docs to only focus on setting flags, and points to the new Request Migration docs.
  • Update dynamo-run docs to only focus on setting flags, and points to the new Request Migration docs.

Where should the reviewer start?

Start with the documentation changes first, and then look into the e2d testing.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

Summary by CodeRabbit

  • Documentation

    • Simplified and clarified request migration documentation across multiple backend and guide files.
    • Added a comprehensive architecture document detailing the request migration system, configuration, and failure handling.
    • Updated backend and guide documentation to reference the new architecture document and describe migration-related options more concisely.
    • Added a README for fault tolerance end-to-end tests, including setup and troubleshooting instructions.
  • New Features

    • Introduced an end-to-end fault tolerance test verifying request migration and recovery from worker failures.
  • Tests

    • Added a new test module for automated validation of request migration, including process management, model download, and migration verification.
  • Style

    • Added debug logging for improved traceability of request IDs in backend handlers.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kthui kthui self-assigned this Jul 29, 2025
@github-actions github-actions bot added the test label Jul 29, 2025
@ishandhanani
Copy link
Contributor

ishandhanani commented Jul 29, 2025

@kthui mentioned that we already have this functionality in SGL as well. @yinggeh to do for SGL as well

@kthui kthui force-pushed the jacky-ft-migrate-test branch from 66bf853 to 3bd3eb3 Compare July 30, 2025 16:19
@kthui kthui changed the title test: Add e2e testing for Request Migration test: Request Migration E2E vLLM Tests and Docs Jul 30, 2025
@kthui kthui marked this pull request as ready for review July 30, 2025 16:25
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
components/backends/vllm/src/dynamo/vllm/handlers.py (1)

116-123: Possible KeyError when optional fields are missing

request["sampling_options"] and request["stop_conditions"] are accessed with
[], which will raise if the caller omits either key.
If the contract allows them to be absent, switch to .get() with a default
empty dict.

-for key, value in request["sampling_options"].items():
+for key, value in request.get("sampling_options", {}).items():
@@
-for key, value in request["stop_conditions"].items():
+for key, value in request.get("stop_conditions", {}).items():
🧹 Nitpick comments (4)
components/backends/vllm/src/dynamo/vllm/handlers.py (2)

108-110: Make the log line self-describing

When scanning large logs it helps to identify the emitting handler immediately.
Recommend prefixing the message with the class name (or decode) to avoid
confusion with the prefill handler.

-logger.debug(f"New Request ID: {request_id}")
+logger.debug(f"DecodeWorkerHandler – new request_id=%s", request_id)

167-169: Consistent log prefix for Prefill handler

Match the decode handler style to keep grep-ability consistent.

-logger.debug(f"New Prefill Request ID: {request_id}")
+logger.debug(f"PrefillWorkerHandler – request_id=%s", request_id)
docs/architecture/request_migration.md (1)

21-29: Clarify default behaviour & flag precedence

The bullet “Default behavior: no migration allowed” can confuse readers because
dynamo-run --migration-limit defaults to 0, which already disables
migration. Consider re-phrasing to emphasise that omitting the flag or setting
it to 0 are equivalent, and that engine-specific overrides (if any) take
precedence over the CLI flag.

tests/fault_tolerance/test_request_migration.py (1)

403-408: Consider making the wait time configurable.

The 0.5-second wait before killing the primary worker is hardcoded. Consider making this configurable or adding a comment explaining why this specific timing was chosen.

-                # Step 6: Wait 0.5 seconds after sending the formal request, then kill the primary worker
+                # Step 6: Wait briefly to ensure request processing has started, then kill the primary worker
+                # 0.5 seconds is sufficient for the request to be received and processing to begin
                 logger.info(
                     f"Killing {primary_worker[1]} with PID {primary_worker[0].get_pid()}"
                 )
                 time.sleep(0.5)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee09de0 and 3bd3eb3.

📒 Files selected for processing (10)
  • components/backends/llama_cpp/README.md (1 hunks)
  • components/backends/sglang/README.md (1 hunks)
  • components/backends/trtllm/README.md (1 hunks)
  • components/backends/vllm/README.md (1 hunks)
  • components/backends/vllm/src/dynamo/vllm/handlers.py (2 hunks)
  • docs/architecture/request_migration.md (1 hunks)
  • docs/guides/backend.md (1 hunks)
  • docs/guides/dynamo_run.md (1 hunks)
  • tests/fault_tolerance/README.md (1 hunks)
  • tests/fault_tolerance/test_request_migration.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
components/backends/vllm/src/dynamo/vllm/handlers.py (1)

Learnt from: ishandhanani
PR: #1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the sampling_options call in the preprocess_request method is placed in the common section after the match statement on request.prompt_input_type(), meaning it applies to both PromptInput::Tokens and PromptInput::Text request types.

tests/fault_tolerance/README.md (1)

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/scenarios.py:57-57
Timestamp: 2025-07-01T15:39:56.789Z
Learning: The fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted container environment, so hardcoded absolute paths with /workspace/ prefix are intentional and should not be changed to relative paths.

🧬 Code Graph Analysis (1)
components/backends/vllm/src/dynamo/vllm/handlers.py (1)
tests/conftest.py (1)
  • logger (98-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test - vllm
  • GitHub Check: pre-merge-rust (.)
🔇 Additional comments (8)
docs/guides/backend.md (1)

70-75: Nice addition – option documented clearly

The new migration_limit kwarg is documented concisely and links to the
architecture doc. Looks good.

components/backends/trtllm/README.md (1)

208-217: Documentation update reads well

The slimmer explanation plus link to the central doc keeps this README tidy.
No further comments.

components/backends/sglang/README.md (1)

142-151: README streamlined correctly

Same remark as for trtllm – reduction in duplicated text and a clear link to
the architecture doc is an improvement.

components/backends/llama_cpp/README.md (1)

9-15: LGTM! Good documentation refactoring.

The changes appropriately centralize the request migration documentation by linking to the architecture document, reducing duplication across backend READMEs.

components/backends/vllm/README.md (1)

192-198: LGTM! Consistent documentation structure.

The changes align with the documentation refactoring pattern, maintaining consistency across backend READMEs.

tests/fault_tolerance/README.md (1)

1-81: Excellent documentation for the fault tolerance test suite!

The README provides comprehensive coverage of test workflow, prerequisites, execution methods, and troubleshooting steps. The use of DeepSeek-R1-Distill-Llama-8B (which doesn't require an API key) is a good choice for automated testing.

docs/guides/dynamo_run.md (1)

214-220: LGTM! Documentation consistency maintained.

The Request Migration section appropriately references the architecture documentation, maintaining consistency with the backend README updates.

tests/fault_tolerance/test_request_migration.py (1)

346-421: Well-structured end-to-end test for request migration!

The test comprehensively validates the request migration functionality with proper setup, execution, and validation steps. The sequential worker startup ensures predictable round-robin behavior, and the use of context managers ensures proper cleanup.

@kthui kthui changed the title test: Request Migration E2E vLLM Tests and Docs test: Request Migration Docs and E2E vLLM Tests Jul 30, 2025
@kthui kthui enabled auto-merge (squash) July 30, 2025 18:04
@kthui kthui requested a review from ishandhanani July 30, 2025 18:05
Copy link
Contributor

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nits - we can add follow up stories to reuse utilities for process termination / augment the managed process to handle logs / move to using the new ready endpoints for the workers (will require setting them up via environment variables)

@kthui kthui disabled auto-merge July 31, 2025 22:49
@kthui kthui enabled auto-merge (squash) August 1, 2025 19:56
@kthui kthui merged commit ae51b3f into main Aug 1, 2025
13 checks passed
@kthui kthui deleted the jacky-ft-migrate-test branch August 1, 2025 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants