11# Running Dynamo (` dynamo run ` )
22
3- * [ Quickstart with pip and vllm] ( #quickstart-with-pip-and-vllm )
4- * [ Automatically download a model from Hugging Face] ( #use-model-from-hugging-face )
5- * [ Run a model from local file] ( #run-a-model-from-local-file )
6- * [ Distributed system] ( #distributed-system )
7- * [ Network names] ( #network-names )
8- * [ KV-aware routing] ( #kv-aware-routing )
9- * [ Full usage details] ( #full-usage-details )
10- * [ Setup] ( #setup )
11- * [ mistral.rs] ( #mistralrs )
12- * [ llama.cpp] ( #llamacpp )
13- * [ Sglang] ( #sglang )
14- * [ Vllm] ( #vllm )
15- * [ TensorRT-LLM] ( #trtllm )
16- * [ Echo Engines] ( #echo-engines )
17- * [ Writing your own engine in Python] ( #writing-your-own-engine-in-python )
18- * [ Batch mode] ( #batch-mode )
19- * [ Defaults] ( #defaults )
20- * [ Extra engine arguments] ( #extra-engine-arguments )
21-
3+ - [ Running Dynamo (` dynamo run ` )] ( #running-dynamo-dynamo-run )
4+ - [ Quickstart with pip and vllm] ( #quickstart-with-pip-and-vllm )
5+ - [ Use model from Hugging Face] ( #use-model-from-hugging-face )
6+ - [ Run a model from local file] ( #run-a-model-from-local-file )
7+ - [ Download model from Hugging Face] ( #download-model-from-hugging-face )
8+ - [ Run model from local file] ( #run-model-from-local-file )
9+ - [ Distributed System] ( #distributed-system )
10+ - [ Network names] ( #network-names )
11+ - [ KV-aware routing] ( #kv-aware-routing )
12+ - [ Full usage details] ( #full-usage-details )
13+ - [ Getting Started] ( #getting-started )
14+ - [ Setup] ( #setup )
15+ - [ Step 1: Install libraries] ( #step-1-install-libraries )
16+ - [ Step 2: Install Rust] ( #step-2-install-rust )
17+ - [ Step 3: Build] ( #step-3-build )
18+ - [ Defaults] ( #defaults )
19+ - [ Running Inference with Pre-built Engines] ( #running-inference-with-pre-built-engines )
20+ - [ mistralrs] ( #mistralrs )
21+ - [ llamacpp] ( #llamacpp )
22+ - [ sglang] ( #sglang )
23+ - [ vllm] ( #vllm )
24+ - [ trtllm] ( #trtllm )
25+ - [ Step 1: Build the environment] ( #step-1-build-the-environment )
26+ - [ Step 2: Run the environment] ( #step-2-run-the-environment )
27+ - [ Step 3: Execute ` dynamo run ` command] ( #step-3-execute-dynamo-run-command )
28+ - [ Echo Engines] ( #echo-engines )
29+ - [ echo\_ core] ( #echo_core )
30+ - [ echo\_ full] ( #echo_full )
31+ - [ Configuration] ( #configuration )
32+ - [ Batch mode] ( #batch-mode )
33+ - [ Extra engine arguments] ( #extra-engine-arguments )
34+ - [ Writing your own engine in Python] ( #writing-your-own-engine-in-python )
2235
2336This guide explains the` dynamo run ` command.
2437
@@ -28,7 +41,7 @@ It supports these engines: mistralrs, llamacpp, sglang, vllm, and tensorrt-llm.
2841
2942Usage:
3043```
31- dynamo-run in=[http|text|dyn://<path>|batch:<folder>] out=echo_core|echo_full|mistralrs|llamacpp|sglang|vllm|dyn [--http-port 8080] [--model-path <path>] [--model-name <served-model-name>] [--model-config <hf-repo>] [--tensor-parallel-size=1] [--context-length=N] [--num-nodes=1] [--node-rank=0] [--leader-addr=127.0.0.1:9876] [--base-gpu-id=0] [--extra-engine-args=args.json] [--router-mode random|round-robin|kv]
44+ dynamo-run in=[http|text|dyn://<path>|batch:<folder>] out=echo_core|echo_full|mistralrs|llamacpp|sglang|vllm|dyn [--http-port 8080] [--model-path <path>] [--model-name <served-model-name>] [--model-config <hf-repo>] [--tensor-parallel-size=1] [--context-length=N] [--num-nodes=1] [--node-rank=0] [--leader-addr=127.0.0.1:9876] [--base-gpu-id=0] [--extra-engine-args=args.json] [--router-mode random|round-robin|kv] [--kv-overlap-score-weight=2.0] [--kv-gpu-cache-usage-weight=1.0] [--kv-waiting-requests-weight=1.0]
3245```
3346
3447Example: ` dynamo run Qwen/Qwen3-0.6B `
0 commit comments