Skip to content

Commit 3094278

Browse files
authored
docs: Create a guide for writing dynamo deployments CR (#1999)
1 parent f0e382a commit 3094278

File tree

5 files changed

+143
-13
lines changed

5 files changed

+143
-13
lines changed

components/backends/vllm/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ For Kubernetes deployment, YAML manifests are provided in the `deploy/` director
115115

116116
#### Prerequisites
117117

118-
- **Dynamo Cloud**: Follow the [Quickstart Guide](../../docs/guides/dynamo_deploy/quickstart.md) to deploy Dynamo Cloud first.
118+
- **Dynamo Cloud**: Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to deploy Dynamo Cloud first.
119119

120120
- **Container Images**: The deployment files currently require access to `nvcr.io/nvidian/nim-llm-dev/vllm-runtime`. If you don't have access, build and push your own image:
121121
```bash

deploy/cloud/helm/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,14 @@ docker login <CONTAINER_REGISTRY>
3636
#### 🛠️ Build and push images for the Dynamo Cloud platform components
3737

3838
[One-time Action]
39-
You should build the images for the Dynamo Cloud Platform.
39+
You should build the image(s) for the Dynamo Cloud Platform.
4040
If you are a **👤 Dynamo User** you would do this step once.
4141

4242
```bash
4343
export DOCKER_SERVER=<your-docker-server>
4444
export IMAGE_TAG=<TAG>
45-
earthly --push +all-docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG
45+
cd deploy/cloud/operator
46+
earthly --push +docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG
4647
```
4748

4849
If you are a **🧑‍💻 Dynamo Contributor** you would have to rebuild the dynamo platform images as the code evolves. To do so please look at the [Cloud Guide](../../../docs/guides/dynamo_deploy/dynamo_cloud.md).

docs/examples/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ export NAMESPACE=<your-namespace> # the namespace you used to deploy Dynamo clou
3636
Deploying an example consists of the simple `kubectl apply -f ... -n ${NAMESPACE}` command. For example:
3737

3838
```bash
39-
kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}
39+
kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}
4040
```
4141

4242
You can use `kubectl get dynamoGraphDeployment -n ${NAMESPACE}` to view your deployment.
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# Creating Kubernetes Deployments
2+
3+
The scripts in the `components/<backend>/launch` folder like [agg.sh](../../../components/backends/vllm/launch/agg.sh) demonstrate how you can serve your models locally.
4+
The corresponding YAML files like [agg.yaml](../../../components/backends/vllm/deploy/agg.yaml) show you how you could create a kubernetes deployment for your inference graph.
5+
6+
7+
This guide explains how to create your own deployment files.
8+
9+
## Step 1: Choose Your Architecture Pattern
10+
11+
Select the architecture pattern as your template that best fits your use case.
12+
13+
For example, when using the `VLLM` inference backend:
14+
15+
- **Development / Testing**
16+
Use [`agg.yaml`](../../../components/backends/vllm/deploy/agg.yaml) as the base configuration.
17+
18+
- **Production with Load Balancing**
19+
Use [`agg_router.yaml`](../../../components/backends/vllm/deploy/agg_router.yaml) to enable scalable, load-balanced inference.
20+
21+
- **High Performance / Disaggregated Deployment**
22+
Use [`disagg_router.yaml`](../../../components/backends/vllm/deploy/disagg_router.yaml) for maximum throughput and modular scalability.
23+
24+
25+
## Step 2: Customize the Template
26+
27+
You can run the Frontend on one machine, for example a CPU node, and the worker on a different machine (a GPU node).
28+
The Frontend serves as a framework-agnostic HTTP entry point and is likely not to need many changes.
29+
30+
It serves the following roles:
31+
1. OpenAI-Compatible HTTP Server
32+
* Provides `/v1/chat/completions` endpoint
33+
* Handles HTTP request/response formatting
34+
* Supports streaming responses
35+
* Validates incoming requests
36+
37+
2. Service Discovery and Routing
38+
* Auto-discovers backend workers via etcd
39+
* Routes requests to the appropriate Processor/Worker components
40+
* Handles load balancing between multiple workers
41+
42+
3. Request Preprocessing
43+
* Initial request validation
44+
* Model name verification
45+
* Request format standardization
46+
47+
You should then pick a worker and specialize the config. For example,
48+
49+
```yaml
50+
VllmWorker: # vLLM-specific config
51+
enforce-eager: true
52+
enable-prefix-caching: true
53+
54+
SglangWorker: # SGLang-specific config
55+
router-mode: kv
56+
disagg-mode: true
57+
58+
TrtllmWorker: # TensorRT-LLM-specific config
59+
engine-config: ./engine.yaml
60+
kv-cache-transfer: ucx
61+
```
62+
63+
Here's a template structure based on the examples:
64+
65+
```yaml
66+
YourWorker:
67+
dynamoNamespace: your-namespace
68+
componentType: worker
69+
replicas: N
70+
envFromSecret: your-secrets # e.g., hf-token-secret
71+
# Health checks for worker initialization
72+
readinessProbe:
73+
exec:
74+
command: ["/bin/sh", "-c", 'grep "Worker.*initialized" /tmp/worker.log']
75+
resources:
76+
requests:
77+
gpu: "1" # GPU allocation
78+
extraPodSpec:
79+
mainContainer:
80+
image: your-image
81+
command:
82+
- /bin/sh
83+
- -c
84+
args:
85+
- python -m dynamo.YOUR_INFERENCE_ENGINE --model YOUR_MODEL --your-flags
86+
```
87+
88+
Consult the corresponding sh file. Each of the python commands to launch a component will go into your yaml spec under the
89+
`extraPodSpec: -> mainContainer: -> args:`
90+
91+
The front end is launched with "python3 -m dynamo.frontend [--http-port 8000] [--router-mode kv]"
92+
Each worker will launch `python -m dynamo.YOUR_INFERENCE_BACKEND --model YOUR_MODEL --your-flags `command.
93+
If you are a Dynamo contributor the [dynamo run guide](../dynamo_run.md) for details on how to run this command.
94+
95+
96+
## Step 3: Key Customization Points
97+
98+
### Model Configuration
99+
100+
```yaml
101+
args:
102+
- "python -m dynamo.YOUR_INFERENCE_BACKEND --model YOUR_MODEL --your-flag"
103+
```
104+
105+
### Resource Allocation
106+
107+
```yaml
108+
resources:
109+
requests:
110+
cpu: "N"
111+
memory: "NGi"
112+
gpu: "N"
113+
```
114+
115+
### Scaling
116+
117+
```yaml
118+
replicas: N # Number of worker instances
119+
```
120+
121+
### Routing Mode
122+
```yaml
123+
args:
124+
- --router-mode
125+
- kv # Enable KV-cache routing
126+
```
127+
128+
### Worker Specialization
129+
130+
```yaml
131+
args:
132+
- --is-prefill-worker # For disaggregated prefill workers
133+
```

docs/guides/dynamo_deploy/quickstart.md

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -64,13 +64,10 @@ Use this approach when developing or customizing Dynamo as a contributor, or usi
6464

6565
Ensure you have the source code checked out and are in the `dynamo` directory:
6666

67-
```bash
68-
cd deploy/cloud/helm/
69-
```
7067

7168
### Set Environment Variables
7269

73-
Our examples use the `nvcr.io` but you can setup your own values if you use another docker registry.
70+
Our examples use the [`nvcr.io`](nvcr.io/nvidia/ai-dynamo/) but you can setup your own values if you use another docker registry.
7471

7572
```bash
7673
export NAMESPACE=dynamo-cloud # or whatever you prefer.
@@ -98,15 +95,13 @@ docker login <your-registry>
9895
docker push <your-registry>/dynamo-base:latest-vllm
9996
```
10097

101-
[More on image building](../../../../README.md)
102-
103-
10498
### Install Dynamo Cloud
10599

106100
You need to build and push the Dynamo Cloud Operator Image by running
107101

108102
```bash
109-
earthly --push +all-docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG
103+
cd deploy/cloud/operator
104+
earthly --push +docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG
110105
```
111106

112107
The Nvidia Cloud Operator image will be pulled from the `$DOCKER_SERVER/dynamo-operator:$IMAGE_TAG`.
@@ -196,4 +191,5 @@ kubectl create secret generic hf-token-secret \
196191
-n ${NAMESPACE}
197192
```
198193

199-
Follow the [Examples](../../examples/README.md)
194+
Follow the [Examples](../../examples/README.md)
195+
For more details on how to create your own deployments follow [Create Deployment Guide](create_deployment.md)

0 commit comments

Comments
 (0)