Skip to content

Commit 8248a11

Browse files
feat: gaie helm chart based example (#2168)
Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent 8b0a035 commit 8248a11

File tree

16 files changed

+512
-164
lines changed

16 files changed

+512
-164
lines changed

deploy/inference-gateway/README.md

Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
## Inference Gateway Setup with Dynamo
2+
3+
This Setup treats each Dynamo deployment as a black box and routes traffic randomly among the deployments.
4+
Currently, this setup is only kgateway based Inference Gateway.
5+
6+
## Table of Contents
7+
8+
- [Prerequisites](#prerequisites)
9+
- [Installation Steps](#installation-steps)
10+
- [Usage](#usage)
11+
12+
## Prerequisites
13+
14+
- Kubernetes cluster with kubectl configured
15+
- NVIDIA GPU drivers installed on worker nodes
16+
17+
## Installation Steps
18+
19+
1. **Install Dynamo Platform**
20+
21+
[See Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
22+
23+
24+
2. **Deploy Inference Gateway**
25+
26+
First, deploy an inference gateway service. In this example, we'll install `kgateway` based gateway implementation.
27+
You can use the script below or follow the steps manually.
28+
29+
Script:
30+
```bash
31+
./install_gaie_crd_kgateway.sh
32+
```
33+
34+
Manual steps:
35+
36+
a. Deploy the Gateway API CRDs:
37+
```bash
38+
GATEWAY_API_VERSION=v1.3.0
39+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/$GATEWAY_API_VERSION/standard-install.yaml
40+
```
41+
42+
b. Install the Inference Extension CRDs (Inferenece Model and Inference Pool CRDs)
43+
```bash
44+
INFERENCE_EXTENSION_VERSION=v0.5.1
45+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/$INFERENCE_EXTENSION_VERSION/manifests.yaml -n my-model
46+
```
47+
48+
c. Install `kgateway` CRDs and kgateway.
49+
```bash
50+
KGATEWAY_VERSION=v2.0.3
51+
52+
# Install the Kgateway CRDs
53+
helm upgrade -i --create-namespace --namespace kgateway-system --version $KGATEWAY_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
54+
55+
# Install Kgateway
56+
helm upgrade -i --namespace kgateway-system --version $KGATEWAY_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
57+
```
58+
59+
d. Deploy the Gateway Instance
60+
```bash
61+
kubectl create namespace my-model
62+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/gateway.yaml -n my-model
63+
```
64+
65+
```bash
66+
kubectl get gateway inference-gateway -n my-model
67+
68+
# Sample output
69+
# NAME CLASS ADDRESS PROGRAMMED AGE
70+
# inference-gateway kgateway x.x.x.x True 1m
71+
```
72+
73+
3. **Install dynamo model and dynamo gaie helm chart**
74+
75+
The Inference Gateway is configured through the `inference-gateway-resources.yaml` file.
76+
77+
Deploy the Inference Gateway resources to your Kubernetes cluster:
78+
79+
```bash
80+
cd deploy/inference-gateway
81+
helm install dynamo-gaie ./helm/dynamo-gaie -n my-model -f ./vllm_agg_qwen.yaml
82+
```
83+
84+
Key configurations include:
85+
- An InferenceModel resource for the Qwen model
86+
- A service for the inference gateway
87+
- Required RBAC roles and bindings
88+
- RBAC permissions
89+
90+
5. **Verify Installation**
91+
92+
Check that all resources are properly deployed:
93+
94+
```bash
95+
kubectl get inferencepool
96+
kubectl get inferencemodel
97+
kubectl get httproute
98+
kubectl get service
99+
kubectl get gateway
100+
```
101+
102+
Sample output:
103+
104+
```bash
105+
# kubectl get inferencepool
106+
NAME AGE
107+
qwen-pool 33m
108+
109+
# kubectl get inferencemodel
110+
NAME MODEL NAME INFERENCE POOL CRITICALITY AGE
111+
qwen-model Qwen/Qwen3-0.6B qwen-pool Critical 33m
112+
113+
# kubectl get httproute
114+
NAME HOSTNAMES AGE
115+
qwen-route 33m
116+
```
117+
118+
## Usage
119+
120+
The Inference Gateway provides HTTP endpoints for model inference.
121+
122+
### 1: Populate gateway URL for your k8s cluster
123+
```bash
124+
export GATEWAY_URL=<Gateway-URL>
125+
```
126+
127+
To test the gateway in minikube, use the following command:
128+
a. User minikube tunnel to expose the gateway to the host
129+
This requires `sudo` access to the host machine. alternatively, you can use port-forward to expose the gateway to the host as shown in alternateive (b).
130+
```bash
131+
# in first terminal
132+
minikube tunnel
133+
134+
# in second terminal where you want to send inference requests
135+
GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o yaml -o jsonpath='{.spec.clusterIP}')
136+
echo $GATEWAY_URL
137+
```
138+
139+
b. use port-forward to expose the gateway to the host
140+
```bash
141+
# in first terminal
142+
kubectl port-forward svc/inference-gateway 8000:80 -n my-model
143+
144+
# in second terminal where you want to send inference requests
145+
GATEWAY_URL=http://localhost:8000
146+
```
147+
148+
### 2: Check models deployed to inference gateway
149+
150+
151+
a. Query models:
152+
```bash
153+
# in the second terminal where you GATEWAY_URL is set
154+
155+
curl $GATEWAY_URL/v1/models | jq .
156+
```
157+
Sample output:
158+
```json
159+
{
160+
"data": [
161+
{
162+
"created": 1753768323,
163+
"id": "Qwen/Qwen3-0.6B",
164+
"object": "object",
165+
"owned_by": "nvidia"
166+
}
167+
],
168+
"object": "list"
169+
}
170+
```
171+
172+
b. Send inference request to gateway:
173+
174+
```bash
175+
MODEL_NAME="Qwen/Qwen3-0.6B"
176+
curl $GATEWAY_URL/v1/chat/completions \
177+
-H "Content-Type: application/json" \
178+
-d '{
179+
"model": "'"${MODEL_NAME}"'",
180+
"messages": [
181+
{
182+
"role": "user",
183+
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
184+
}
185+
],
186+
"stream":false,
187+
"max_tokens": 30,
188+
"temperature": 0.0
189+
}'
190+
```
191+
192+
Sample inference output:
193+
194+
```json
195+
{
196+
"choices": [
197+
{
198+
"finish_reason": "stop",
199+
"index": 0,
200+
"logprobs": null,
201+
"message": {
202+
"audio": null,
203+
"content": "<think>\nOkay, I need to develop a character background for the user's query. Let me start by understanding the requirements. The character is an",
204+
"function_call": null,
205+
"refusal": null,
206+
"role": "assistant",
207+
"tool_calls": null
208+
}
209+
}
210+
],
211+
"created": 1753768682,
212+
"id": "chatcmpl-772289b8-5998-4f6d-bd61-3659b684b347",
213+
"model": "Qwen/Qwen3-0.6B",
214+
"object": "chat.completion",
215+
"service_tier": null,
216+
"system_fingerprint": null,
217+
"usage": {
218+
"completion_tokens": 29,
219+
"completion_tokens_details": null,
220+
"prompt_tokens": 196,
221+
"prompt_tokens_details": null,
222+
"total_tokens": 225
223+
}
224+
}
225+
```

deploy/inference-gateway/example/README.md

Lines changed: 0 additions & 136 deletions
This file was deleted.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Patterns to ignore when building packages.
2+
# This supports shell glob matching, relative path matching, and
3+
# negation (prefixed with !). Only one pattern per line.
4+
.DS_Store
5+
# Common VCS dirs
6+
.git/
7+
.gitignore
8+
.bzr/
9+
.bzrignore
10+
.hg/
11+
.hgignore
12+
.svn/
13+
# Common backup files
14+
*.swp
15+
*.bak
16+
*.tmp
17+
*.orig
18+
*~
19+
# Various IDEs
20+
.project
21+
.idea/
22+
*.tmproj
23+
.vscode/

0 commit comments

Comments
 (0)