Skip to content

Commit e1ae0f1

Browse files
authored
feat: add multimodal k8s deployment manifest (#1836)
1 parent 39c8d12 commit e1ae0f1

File tree

8 files changed

+916
-1
lines changed

8 files changed

+916
-1
lines changed

examples/multimodal/README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -428,3 +428,60 @@ You should see a response describing the video's content similar to
428428
]
429429
}
430430
```
431+
432+
433+
## Deploying Multimodal Examples on Kubernetes
434+
435+
This guide will help you quickly deploy and clean up the multimodal example services in Kubernetes.
436+
437+
### Prerequisites
438+
439+
- **Dynamo Cloud** is already deployed in your target Kubernetes namespace.
440+
- You have `kubectl` access to your cluster and the correct namespace set in `$NAMESPACE`.
441+
442+
443+
### Create a secret with huggingface token
444+
445+
```bash
446+
export HF_TOKEN="huggingfacehub token with read permission to models"
447+
kubectl create secret generic hf-token-secret --from-literal=HF_TOKEN=$HF_TOKEN -n $KUBE_NS || true
448+
```
449+
450+
---
451+
452+
Choose the example you want to deploy or delete. The YAML files are located in `examples/multimodal/deploy/k8s/`.
453+
454+
### Deploy the Multimodal Example
455+
456+
```bash
457+
kubectl apply -f examples/multimodal/deploy/k8s/<Example yaml file> -n $NAMESPACE
458+
```
459+
460+
### Uninstall the Multimodal Example
461+
462+
463+
```bash
464+
kubectl delete -f examples/multimodal/deploy/k8s/<Example yaml file> -n $NAMESPACE
465+
```
466+
467+
### Using a different dynamo container
468+
469+
To customize the container image used in your deployment, you will need to update the manifest before applying it.
470+
471+
You can use [`yq`](https://github.com/mikefarah/yq?tab=readme-ov-file#install), a portable command-line YAML processor.
472+
473+
Please follow the [installation instructions](https://github.com/mikefarah/yq?tab=readme-ov-file#install) for your platform if you do not already have `yq` installed. After installing `yq`, you can generate and apply your manifest as follows:
474+
475+
476+
```bash
477+
export DYNAMO_IMAGE=my-registry/my-image:tag
478+
479+
yq '.spec.services.[].extraPodSpec.mainContainer.image = env(DYNAMO_IMAGE)' $EXAMPLE_FILE > my_example_manifest.yaml
480+
481+
# install the dynamo example
482+
kubectl apply -f my_example_manifest.yaml -n $NAMESPACE
483+
484+
# uninstall the dynamo example
485+
kubectl delete -f my_example_manifest.yaml -n $NAMESPACE
486+
487+
```
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
apiVersion: nvidia.com/v1alpha1
16+
kind: DynamoGraphDeployment
17+
metadata:
18+
name: agg-llava
19+
spec:
20+
envs:
21+
services:
22+
Frontend:
23+
dynamoNamespace: agg-llava
24+
componentType: main
25+
replicas: 1
26+
resources:
27+
requests:
28+
cpu: "1"
29+
memory: "2Gi"
30+
limits:
31+
cpu: "1"
32+
memory: "2Gi"
33+
extraPodSpec:
34+
mainContainer:
35+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.3.1
36+
workingDir: /workspace/examples/multimodal
37+
args:
38+
- dynamo
39+
- serve
40+
- graphs.agg:Frontend
41+
- --system-app-port
42+
- "5000"
43+
- --enable-system-app
44+
- --use-default-health-checks
45+
- --service-name
46+
- Frontend
47+
- -f
48+
- ./configs/agg-llava.yaml
49+
Processor:
50+
dynamoNamespace: agg-llava
51+
componentType: worker
52+
replicas: 1
53+
resources:
54+
requests:
55+
cpu: "1"
56+
memory: "2Gi"
57+
limits:
58+
cpu: "1"
59+
memory: "2Gi"
60+
extraPodSpec:
61+
mainContainer:
62+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.3.1
63+
workingDir: /workspace/examples/multimodal
64+
args:
65+
- dynamo
66+
- serve
67+
- graphs.agg:Processor
68+
- --system-app-port
69+
- "5000"
70+
- --enable-system-app
71+
- --use-default-health-checks
72+
- --service-name
73+
- Processor
74+
- -f
75+
- ./configs/agg-llava.yaml
76+
VllmDecodeWorker:
77+
envFromSecret: hf-token-secret
78+
dynamoNamespace: agg-llava
79+
replicas: 1
80+
resources:
81+
requests:
82+
cpu: "10"
83+
memory: "20Gi"
84+
gpu: "1"
85+
limits:
86+
cpu: "10"
87+
memory: "20Gi"
88+
gpu: "1"
89+
extraPodSpec:
90+
mainContainer:
91+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.3.1
92+
workingDir: /workspace/examples/multimodal
93+
args:
94+
- dynamo
95+
- serve
96+
- graphs.agg:VllmDecodeWorker
97+
- --system-app-port
98+
- "5000"
99+
- --enable-system-app
100+
- --use-default-health-checks
101+
- --service-name
102+
- VllmDecodeWorker
103+
- -f
104+
- ./configs/agg-llava.yaml
105+
VllmEncodeWorker:
106+
envFromSecret: hf-token-secret
107+
dynamoNamespace: agg-llava
108+
replicas: 1
109+
resources:
110+
requests:
111+
cpu: "10"
112+
memory: "20Gi"
113+
gpu: "1"
114+
limits:
115+
cpu: "10"
116+
memory: "20Gi"
117+
gpu: "1"
118+
extraPodSpec:
119+
mainContainer:
120+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.3.1
121+
workingDir: /workspace/examples/multimodal
122+
args:
123+
- dynamo
124+
- serve
125+
- graphs.agg:VllmEncodeWorker
126+
- --system-app-port
127+
- "5000"
128+
- --enable-system-app
129+
- --use-default-health-checks
130+
- --service-name
131+
- VllmEncodeWorker
132+
- -f
133+
- ./configs/agg-llava.yaml
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
apiVersion: nvidia.com/v1alpha1
16+
kind: DynamoGraphDeployment
17+
metadata:
18+
name: agg-phi3v
19+
spec:
20+
envs:
21+
services:
22+
Frontend:
23+
dynamoNamespace: agg-phi3v
24+
componentType: main
25+
replicas: 1
26+
resources:
27+
requests:
28+
cpu: "1"
29+
memory: "2Gi"
30+
limits:
31+
cpu: "1"
32+
memory: "2Gi"
33+
extraPodSpec:
34+
mainContainer:
35+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.3.1
36+
workingDir: /workspace/examples/multimodal
37+
args:
38+
- dynamo
39+
- serve
40+
- graphs.agg:Frontend
41+
- --system-app-port
42+
- "5000"
43+
- --enable-system-app
44+
- --use-default-health-checks
45+
- --service-name
46+
- Frontend
47+
- -f
48+
- ./configs/agg-phi3v.yaml
49+
Processor:
50+
dynamoNamespace: agg-phi3v
51+
componentType: worker
52+
replicas: 1
53+
resources:
54+
requests:
55+
cpu: "1"
56+
memory: "2Gi"
57+
limits:
58+
cpu: "1"
59+
memory: "2Gi"
60+
extraPodSpec:
61+
mainContainer:
62+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.3.1
63+
workingDir: /workspace/examples/multimodal
64+
args:
65+
- dynamo
66+
- serve
67+
- graphs.agg:Processor
68+
- --system-app-port
69+
- "5000"
70+
- --enable-system-app
71+
- --use-default-health-checks
72+
- --service-name
73+
- Processor
74+
- -f
75+
- ./configs/agg-phi3v.yaml
76+
VllmDecodeWorker:
77+
envFromSecret: hf-token-secret
78+
dynamoNamespace: agg-phi3v
79+
replicas: 1
80+
resources:
81+
requests:
82+
cpu: "10"
83+
memory: "20Gi"
84+
gpu: "1"
85+
limits:
86+
cpu: "10"
87+
memory: "20Gi"
88+
gpu: "1"
89+
extraPodSpec:
90+
mainContainer:
91+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.3.1
92+
workingDir: /workspace/examples/multimodal
93+
args:
94+
- dynamo
95+
- serve
96+
- graphs.agg:VllmDecodeWorker
97+
- --system-app-port
98+
- "5000"
99+
- --enable-system-app
100+
- --use-default-health-checks
101+
- --service-name
102+
- VllmDecodeWorker
103+
- -f
104+
- ./configs/agg-phi3v.yaml
105+
VllmEncodeWorker:
106+
envFromSecret: hf-token-secret
107+
dynamoNamespace: agg-phi3v
108+
replicas: 1
109+
resources:
110+
requests:
111+
cpu: "10"
112+
memory: "20Gi"
113+
gpu: "1"
114+
limits:
115+
cpu: "10"
116+
memory: "20Gi"
117+
gpu: "1"
118+
extraPodSpec:
119+
mainContainer:
120+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.3.1
121+
workingDir: /workspace/examples/multimodal
122+
args:
123+
- dynamo
124+
- serve
125+
- graphs.agg:VllmEncodeWorker
126+
- --system-app-port
127+
- "5000"
128+
- --enable-system-app
129+
- --use-default-health-checks
130+
- --service-name
131+
- VllmEncodeWorker
132+
- -f
133+
- ./configs/agg-phi3v.yaml

0 commit comments

Comments
 (0)