-
Notifications
You must be signed in to change notification settings - Fork 676
chore: pass in mocker engine args directly in python cli + default frontend port to 8000 #2853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: PeaBrane <yanrpei@gmail.com>
WalkthroughUpdates introduce a vLLM-style CLI for the mocker backend, generating a temporary JSON of engine args when not provided. The worker selects between user-supplied JSON and generated args, with cleanup. Tests switch from file-based configs to in-memory dicts driving CLI flags. Documentation reflects the new interface. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User as User/Tests
participant CLI as mocker.main.py (CLI)
participant FS as Temp JSON (MockEngineArgs)
participant Eng as Engine Entrypoint
User->>CLI: Invoke mocker with CLI flags and/or --extra-engine-args
alt --extra-engine-args provided
CLI->>Eng: Build args using provided JSON path
else Generate from CLI flags
CLI->>FS: Serialize non-None MockEngineArgs to temp JSON
CLI->>Eng: Build args using temp JSON path
end
Eng->>Eng: Start engine with MockEngineArgs
note over CLI,FS: Finally: delete temp JSON if created
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/backends/mocker/src/dynamo/mocker/main.py (1)
1-225: Install Black, add it to dev dependencies, and format main.py
- Add
blackto the project’s development dependencies (e.g. inrequirements-dev.txtorpyproject.toml) so it’s available in CI.- Run
and commit the resulting changes.black components/backends/mocker/src/dynamo/mocker/main.py- Ruff fixes have been applied; verify no lint errors remain.
tests/router/test_router_e2e_with_mockers.py (1)
1-1230: Require Black installation or alternative invocation
- The standalone
blackcommand wasn’t found (error:black: command not found), so formatting didn’t run. Add Black to the project’s dev dependencies or update the CI/dev instructions to invoke it viapython -m black, then formattests/router/test_router_e2e_with_mockers.py.- Ruff fixes have been applied successfully (no remaining lint issues).
🧹 Nitpick comments (2)
components/backends/mocker/README.md (1)
14-14: Consider removing trailing colons from headingsThe trailing colons in headings violate Markdown best practices (MD026). While this is a minor issue, removing them would improve consistency with standard Markdown formatting.
-### Required arguments: +### Required arguments-### MockEngineArgs parameters (vLLM-style): +### MockEngineArgs parameters (vLLM-style)-### Example with individual arguments (vLLM-style): +### Example with individual arguments (vLLM-style)-### Legacy JSON file support: +### Legacy JSON file supportAlso applies to: 17-17, 28-28, 43-43
components/backends/mocker/src/dynamo/mocker/main.py (1)
29-66: Consider using dataclasses for MockEngineArgs serializationThe current implementation manually checks each attribute and builds the dictionary. Consider using a dataclass or Pydantic model for MockEngineArgs to make the serialization more maintainable and type-safe.
Would you like me to generate a dataclass-based implementation that would make this code more maintainable and reduce the repetitive hasattr checks?
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
components/backends/mocker/README.md(1 hunks)components/backends/mocker/src/dynamo/mocker/main.py(3 hunks)tests/router/test_router_e2e_with_mockers.py(11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#2756
File: lib/llm/src/kv_router/subscriber.rs:36-44
Timestamp: 2025-08-29T10:03:48.330Z
Learning: PeaBrane prefers to keep PRs contained in scope and is willing to defer technical improvements to future PRs when the current implementation works for the immediate use case. They acknowledge technical debt but prioritize deliverability over completeness in individual PRs.
🧬 Code graph analysis (2)
tests/router/test_router_e2e_with_mockers.py (1)
tests/conftest.py (1)
logger(124-135)
components/backends/mocker/src/dynamo/mocker/main.py (1)
lib/llm/src/local_model.rs (6)
extra_engine_args(174-177)model_path(93-96)model_name(98-101)endpoint_id(108-111)endpoint_id(390-392)default(67-89)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2853/merge) by PeaBrane.
tests/router/test_router_e2e_with_mockers.py
[error] 1-1: Black formatting changes detected. Run 'black' to reformat this file.
[error] 1-1: Ruff linting issues auto-fixed in this file during pre-commit.
components/backends/mocker/src/dynamo/mocker/main.py
[error] 1-1: Black formatting changes detected. Run 'black' to reformat this file.
[error] 1-1: Ruff linting issues auto-fixed in this file during pre-commit.
🪛 LanguageTool
components/backends/mocker/README.md
[style] ~12-~12: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ...mocker engine now supports a vLLM-style CLI interface with individual arguments for all confi...
(ACRONYM_TAUTOLOGY)
[grammar] ~14-~14: There might be a mistake here.
Context: ...ration options. ### Required arguments: - --model-path: Path to model directory or HuggingFace...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ... MockEngineArgs parameters (vLLM-style): - --num-gpu-blocks-override: Number of GPU blocks for KV cache (def...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ...GPU blocks for KV cache (default: 16384) - --block-size: Token block size for KV cache blocks (...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ...k size for KV cache blocks (default: 64) - --max-num-seqs: Maximum number of sequences per iterat...
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...f sequences per iteration (default: 256) - --max-num-batched-tokens: Maximum number of batched tokens per i...
(QB_NEW_EN)
[grammar] ~23-~23: There might be a mistake here.
Context: .../disable chunked prefill (default: True) - --watermark: KV cache watermark threshold as a frac...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: ... threshold as a fraction (default: 0.01) - --speedup-ratio: Speed multiplier for token generation ...
(QB_NEW_EN)
[grammar] ~43-~43: There might be a mistake here.
Context: ... 8080 ``` ### Legacy JSON file support: For backward compatibility, you can stil...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
components/backends/mocker/README.md
14-14: Trailing punctuation in heading
Punctuation: ':'
(MD026, no-trailing-punctuation)
17-17: Trailing punctuation in heading
Punctuation: ':'
(MD026, no-trailing-punctuation)
28-28: Trailing punctuation in heading
Punctuation: ':'
(MD026, no-trailing-punctuation)
43-43: Trailing punctuation in heading
Punctuation: ':'
(MD026, no-trailing-punctuation)
🪛 Ruff (0.12.2)
components/backends/mocker/src/dynamo/mocker/main.py
80-80: f-string without any placeholders
Remove extraneous f prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test - vllm
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
components/backends/mocker/README.md (1)
10-12: Documentation is clear and well-structuredThe transition to a vLLM-style CLI interface is well documented with clear examples showing both the new individual argument approach and legacy JSON file support for backward compatibility.
components/backends/mocker/src/dynamo/mocker/main.py (2)
161-173: Boolean flag handling looks correctThe implementation properly handles the vLLM-style boolean flags with both positive and negative forms (--enable-prefix-caching/--no-enable-prefix-caching), correctly setting the destination attribute.
Also applies to: 175-187
82-104: Good resource cleanup implementationThe try/finally block ensures proper cleanup of the temporary file, with appropriate error handling and logging. This prevents resource leaks.
tests/router/test_router_e2e_with_mockers.py (2)
52-101: Clean refactoring to dictionary-based configurationThe transition from file-based to in-memory dictionary configuration is well-implemented. The mapping from dictionary keys to CLI arguments is correct and maintains backward compatibility by defaulting to an empty dictionary when mocker_args is None.
322-324: Good test coverage for new CLI interfaceAll test methods have been properly updated to use the new dictionary-based mocker_args configuration. The tests appropriately cover different scenarios including:
- Standard configuration with speedup_ratio and block_size
- Resource-limited configuration for overload testing
- Multiple mocker instances
Also applies to: 377-379, 444-449, 609-611, 833-835
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
alec-flowers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Signed-off-by: PeaBrane <yanrpei@gmail.com>
…ontend port to 8000 (#2853) Signed-off-by: PeaBrane <yanrpei@gmail.com>
…ontend port to 8000 (#2853) Signed-off-by: PeaBrane <yanrpei@gmail.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
Overview:
to mirror vllm cli closer, and additionally improve ergonomics
Usage for example is now, exactly as we would when launching
python -m dynamo.vllmTangential cli change: make default frontend port 8000 (k8 default)
Summary by CodeRabbit
New Features
Documentation