You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: components/backends/mocker/README.md
+39-9Lines changed: 39 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,17 +7,47 @@ The mocker engine is a mock vLLM implementation designed for testing and develop
7
7
- Developing and debugging Dynamo components
8
8
- Load testing and performance analysis
9
9
10
-
**Basic usage:**
10
+
## Basic usage
11
11
12
-
The `--model-path` is required but can point to any valid model path - the mocker doesn't actually load the model weights (but the pre-processor needs the tokenizer). The arguments `block_size`, `num_gpu_blocks`, `max_num_seqs`, `max_num_batched_tokens`, `enable_prefix_caching`, and `enable_chunked_prefill` are common arguments shared with the real VLLM engine.
12
+
The mocker engine now supports a vLLM-style CLI interface with individual arguments for all configuration options.
13
13
14
-
And below are arguments that are mocker-specific:
15
-
-`speedup_ratio`: Speed multiplier for token generation (default: 1.0). Higher values make the simulation engines run faster.
16
-
-`dp_size`: Number of data parallel workers to simulate (default: 1)
17
-
-`watermark`: KV cache watermark threshold as a fraction (default: 0.01). This argument also exists for the real VLLM engine but cannot be passed as an engine arg.
14
+
### Required arguments:
15
+
-`--model-path`: Path to model directory or HuggingFace model ID (required for tokenizer)
18
16
17
+
### MockEngineArgs parameters (vLLM-style):
18
+
-`--num-gpu-blocks-override`: Number of GPU blocks for KV cache (default: 16384)
19
+
-`--block-size`: Token block size for KV cache blocks (default: 64)
20
+
-`--max-num-seqs`: Maximum number of sequences per iteration (default: 256)
21
+
-`--max-num-batched-tokens`: Maximum number of batched tokens per iteration (default: 8192)
0 commit comments