Skip to content

Conversation

@PeaBrane
Copy link
Contributor

@PeaBrane PeaBrane commented Sep 3, 2025

Overview:

to mirror vllm cli closer, and additionally improve ergonomics

Usage for example is now, exactly as we would when launching python -m dynamo.vllm

python -m dynamo.mocker \
  --model-path TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
  --num-gpu-blocks-override 8192 \
  --block-size 16 \
  --speedup-ratio 10.0 \
  --max-num-seqs 512 \
  --enable-prefix-caching

Tangential cli change: make default frontend port 8000 (k8 default)

Summary by CodeRabbit

  • New Features

    • Introduced a vLLM-style CLI for the mocker backend with per-argument flags (e.g., --num-gpu-blocks-override, --block-size, --max-num-seqs, --max-num-batched-tokens, --enable-prefix-caching, --enable-chunked-prefill, --watermark, --speedup-ratio, --data-parallel-size).
    • Supports passing options directly via CLI or via legacy JSON using --extra-engine-args (CLI flags are overridden by the JSON file).
    • Improved logging and automatic cleanup of temporary config files.
  • Documentation

    • Reorganized and expanded mocker README with required arguments, full option reference, usage examples, and legacy JSON guidance.

Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 4, 2025

Walkthrough

Updates introduce a vLLM-style CLI for the mocker backend, generating a temporary JSON of engine args when not provided. The worker selects between user-supplied JSON and generated args, with cleanup. Tests switch from file-based configs to in-memory dicts driving CLI flags. Documentation reflects the new interface.

Changes

Cohort / File(s) Summary
Docs: mocker CLI
components/backends/mocker/README.md
Rewrites mocker usage to a vLLM-style CLI; adds required args, detailed MockEngineArgs options, examples, legacy JSON support, and formatting updates.
Backend: mocker worker + CLI
components/backends/mocker/src/dynamo/mocker/main.py
Adds create_temp_engine_args_file(args) to serialize non-None CLI options to JSON; worker prefers user --extra-engine-args otherwise generates temp; ensures cleanup; extends CLI with vLLM-like flags; adds logging and updated descriptions.
Tests: router e2e with mockers
tests/router/test_router_e2e_with_mockers.py
Replaces file-based mocker_args.json with in-memory dict passed to MockerProcess; builds CLI flags per key; updates constructor signature; refactors test setups accordingly.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User as User/Tests
  participant CLI as mocker.main.py (CLI)
  participant FS as Temp JSON (MockEngineArgs)
  participant Eng as Engine Entrypoint

  User->>CLI: Invoke mocker with CLI flags and/or --extra-engine-args
  alt --extra-engine-args provided
    CLI->>Eng: Build args using provided JSON path
  else Generate from CLI flags
    CLI->>FS: Serialize non-None MockEngineArgs to temp JSON
    CLI->>Eng: Build args using temp JSON path
  end
  Eng->>Eng: Start engine with MockEngineArgs
  note over CLI,FS: Finally: delete temp JSON if created
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A rabbit twitches whiskers, bright,
New flags bloom softly in the night.
JSON dreams now temp and clean,
Engines hum with vLLM sheen.
Tests hop light—no files to tend,
CLI crumbs lead to the end.
Thump-thump: shipped, my furry friend! 🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
components/backends/mocker/src/dynamo/mocker/main.py (1)

1-225: Install Black, add it to dev dependencies, and format main.py

  • Add black to the project’s development dependencies (e.g. in requirements-dev.txt or pyproject.toml) so it’s available in CI.
  • Run
    black components/backends/mocker/src/dynamo/mocker/main.py
    and commit the resulting changes.
  • Ruff fixes have been applied; verify no lint errors remain.
tests/router/test_router_e2e_with_mockers.py (1)

1-1230: Require Black installation or alternative invocation

  • The standalone black command wasn’t found (error: black: command not found), so formatting didn’t run. Add Black to the project’s dev dependencies or update the CI/dev instructions to invoke it via python -m black, then format tests/router/test_router_e2e_with_mockers.py.
  • Ruff fixes have been applied successfully (no remaining lint issues).
🧹 Nitpick comments (2)
components/backends/mocker/README.md (1)

14-14: Consider removing trailing colons from headings

The trailing colons in headings violate Markdown best practices (MD026). While this is a minor issue, removing them would improve consistency with standard Markdown formatting.

-### Required arguments:
+### Required arguments
-### MockEngineArgs parameters (vLLM-style):
+### MockEngineArgs parameters (vLLM-style)
-### Example with individual arguments (vLLM-style):
+### Example with individual arguments (vLLM-style)
-### Legacy JSON file support:
+### Legacy JSON file support

Also applies to: 17-17, 28-28, 43-43

components/backends/mocker/src/dynamo/mocker/main.py (1)

29-66: Consider using dataclasses for MockEngineArgs serialization

The current implementation manually checks each attribute and builds the dictionary. Consider using a dataclass or Pydantic model for MockEngineArgs to make the serialization more maintainable and type-safe.

Would you like me to generate a dataclass-based implementation that would make this code more maintainable and reduce the repetitive hasattr checks?

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between eb317cd and 20f4ddc.

📒 Files selected for processing (3)
  • components/backends/mocker/README.md (1 hunks)
  • components/backends/mocker/src/dynamo/mocker/main.py (3 hunks)
  • tests/router/test_router_e2e_with_mockers.py (11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#2756
File: lib/llm/src/kv_router/subscriber.rs:36-44
Timestamp: 2025-08-29T10:03:48.330Z
Learning: PeaBrane prefers to keep PRs contained in scope and is willing to defer technical improvements to future PRs when the current implementation works for the immediate use case. They acknowledge technical debt but prioritize deliverability over completeness in individual PRs.
🧬 Code graph analysis (2)
tests/router/test_router_e2e_with_mockers.py (1)
tests/conftest.py (1)
  • logger (124-135)
components/backends/mocker/src/dynamo/mocker/main.py (1)
lib/llm/src/local_model.rs (6)
  • extra_engine_args (174-177)
  • model_path (93-96)
  • model_name (98-101)
  • endpoint_id (108-111)
  • endpoint_id (390-392)
  • default (67-89)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2853/merge) by PeaBrane.
tests/router/test_router_e2e_with_mockers.py

[error] 1-1: Black formatting changes detected. Run 'black' to reformat this file.


[error] 1-1: Ruff linting issues auto-fixed in this file during pre-commit.

components/backends/mocker/src/dynamo/mocker/main.py

[error] 1-1: Black formatting changes detected. Run 'black' to reformat this file.


[error] 1-1: Ruff linting issues auto-fixed in this file during pre-commit.

🪛 LanguageTool
components/backends/mocker/README.md

[style] ~12-~12: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ...mocker engine now supports a vLLM-style CLI interface with individual arguments for all confi...

(ACRONYM_TAUTOLOGY)


[grammar] ~14-~14: There might be a mistake here.
Context: ...ration options. ### Required arguments: - --model-path: Path to model directory or HuggingFace...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ... MockEngineArgs parameters (vLLM-style): - --num-gpu-blocks-override: Number of GPU blocks for KV cache (def...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ...GPU blocks for KV cache (default: 16384) - --block-size: Token block size for KV cache blocks (...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...k size for KV cache blocks (default: 64) - --max-num-seqs: Maximum number of sequences per iterat...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...f sequences per iteration (default: 256) - --max-num-batched-tokens: Maximum number of batched tokens per i...

(QB_NEW_EN)


[grammar] ~23-~23: There might be a mistake here.
Context: .../disable chunked prefill (default: True) - --watermark: KV cache watermark threshold as a frac...

(QB_NEW_EN)


[grammar] ~24-~24: There might be a mistake here.
Context: ... threshold as a fraction (default: 0.01) - --speedup-ratio: Speed multiplier for token generation ...

(QB_NEW_EN)


[grammar] ~43-~43: There might be a mistake here.
Context: ... 8080 ``` ### Legacy JSON file support: For backward compatibility, you can stil...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
components/backends/mocker/README.md

14-14: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)


17-17: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)


28-28: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)


43-43: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

🪛 Ruff (0.12.2)
components/backends/mocker/src/dynamo/mocker/main.py

80-80: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test - vllm
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
components/backends/mocker/README.md (1)

10-12: Documentation is clear and well-structured

The transition to a vLLM-style CLI interface is well documented with clear examples showing both the new individual argument approach and legacy JSON file support for backward compatibility.

components/backends/mocker/src/dynamo/mocker/main.py (2)

161-173: Boolean flag handling looks correct

The implementation properly handles the vLLM-style boolean flags with both positive and negative forms (--enable-prefix-caching/--no-enable-prefix-caching), correctly setting the destination attribute.

Also applies to: 175-187


82-104: Good resource cleanup implementation

The try/finally block ensures proper cleanup of the temporary file, with appropriate error handling and logging. This prevents resource leaks.

tests/router/test_router_e2e_with_mockers.py (2)

52-101: Clean refactoring to dictionary-based configuration

The transition from file-based to in-memory dictionary configuration is well-implemented. The mapping from dictionary keys to CLI arguments is correct and maintains backward compatibility by defaulting to an empty dictionary when mocker_args is None.


322-324: Good test coverage for new CLI interface

All test methods have been properly updated to use the new dictionary-based mocker_args configuration. The tests appropriately cover different scenarios including:

  • Standard configuration with speedup_ratio and block_size
  • Resource-limited configuration for overload testing
  • Multiple mocker instances

Also applies to: 377-379, 444-449, 609-611, 833-835

Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
@PeaBrane PeaBrane enabled auto-merge (squash) September 4, 2025 00:46
@PeaBrane PeaBrane disabled auto-merge September 4, 2025 00:51
@PeaBrane PeaBrane changed the title chore: pass in mocker engine args directly in python cli chore: pass in mocker engine args directly in python cli + default frontend port to 8000 Sep 4, 2025
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Copy link
Contributor

@alec-flowers alec-flowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@PeaBrane PeaBrane disabled auto-merge September 4, 2025 03:07
Signed-off-by: PeaBrane <yanrpei@gmail.com>
@PeaBrane PeaBrane requested a review from a team as a code owner September 4, 2025 03:52
@PeaBrane PeaBrane enabled auto-merge (squash) September 4, 2025 03:52
@PeaBrane PeaBrane merged commit 3c7c1d6 into main Sep 4, 2025
14 of 15 checks passed
@PeaBrane PeaBrane deleted the rupei/clean-mocker-cli branch September 4, 2025 04:26
dillon-cullinan pushed a commit that referenced this pull request Sep 5, 2025
…ontend port to 8000 (#2853)

Signed-off-by: PeaBrane <yanrpei@gmail.com>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
…ontend port to 8000 (#2853)

Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants