Skip to content

Commit 6297859

Browse files
authored
feat: Add epp-aware gateway integration (#2345)
1 parent 28546ba commit 6297859

File tree

7 files changed

+166
-22
lines changed

7 files changed

+166
-22
lines changed

deploy/inference-gateway/README.md

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
## Inference Gateway Setup with Dynamo
22

3-
This Setup treats each Dynamo deployment as a black box and routes traffic randomly among the deployments.
4-
Currently, this setup is only kgateway based Inference Gateway.
3+
This guide demonstrates two setups.
4+
The EPP-unaware setup treats each Dynamo deployment as a black box and routes traffic randomly among the deployments.
5+
The EPP-aware setup first uses Dynamo Router to pick the worker instance id for serving the model. Then traffic gets directed straight to the selected worker.
6+
Currently, these setups are only supported with the kGateway based Inference Gateway.
57

68
## Table of Contents
79

@@ -39,7 +41,7 @@ GATEWAY_API_VERSION=v1.3.0
3941
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/$GATEWAY_API_VERSION/standard-install.yaml
4042
```
4143

42-
b. Install the Inference Extension CRDs (Inferenece Model and Inference Pool CRDs)
44+
b. Install the Inference Extension CRDs (Inference Model and Inference Pool CRDs)
4345
```bash
4446
INFERENCE_EXTENSION_VERSION=v0.5.1
4547
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/$INFERENCE_EXTENSION_VERSION/manifests.yaml -n my-model
@@ -84,13 +86,39 @@ kubectl apply -f agg.yaml -n my-model
8486

8587
The Inference Gateway is configured through the `inference-gateway-resources.yaml` file.
8688

87-
Deploy the Inference Gateway resources to your Kubernetes cluster:
89+
Deploy the Inference Gateway resources to your Kubernetes cluster by running one of the commands below.
90+
91+
For the EPP-unaware black box integration run:
8892

8993
```bash
9094
cd deploy/inference-gateway
9195
helm install dynamo-gaie ./helm/dynamo-gaie -n my-model -f ./vllm_agg_qwen.yaml
9296
```
9397

98+
For the EPP-aware integration run:
99+
100+
```bash
101+
cd deploy/inference-gateway
102+
103+
helm install dynamo-gaie ./helm/dynamo-gaie \
104+
-n my-model \
105+
-f ./vllm_agg_qwen.yaml \
106+
-f ./values-epp-aware.yaml
107+
```
108+
109+
Or customize the EPP further using flags, i.e:
110+
111+
```bash
112+
helm install dynamo-gaie ./helm/dynamo-gaie \
113+
-n my-model \
114+
-f ./vllm_agg_qwen.yaml \
115+
--set eppAware.enabled=true \
116+
--set eppAware.eppImage=docker.io/lambda108/epp-inference-extension-dynamo:1.0.0 \
117+
--set imagePullSecrets='{docker-imagepullsecret}' \
118+
--set-string epp.extraEnv[0].name=USE_STREAMING \
119+
--set-string epp.extraEnv[0].value=true
120+
```
121+
94122
Key configurations include:
95123
- An InferenceModel resource for the Qwen model
96124
- A service for the inference gateway

deploy/inference-gateway/helm/dynamo-gaie/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ type: application
3030
# This is the chart version. This version number should be incremented each time you make changes
3131
# to the chart and its templates, including the app version.
3232
# Versions are expected to follow Semantic Versioning (https://semver.org/)
33-
version: 0.1.0
33+
version: 0.2.0
3434

3535
# This is the version number of the application being deployed. This version number should be
3636
# incremented each time you make changes to the application. Versions are not expected to

deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml

Lines changed: 58 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -31,23 +31,44 @@ spec:
3131
spec:
3232
# Conservatively, this timeout should mirror the longest grace period of the pods within the pool
3333
terminationGracePeriodSeconds: 130
34+
35+
{{- if .Values.imagePullSecrets }}
36+
imagePullSecrets:
37+
{{- range .Values.imagePullSecrets }}
38+
- name: {{ . | quote }}
39+
{{- end }}
40+
{{- end }}
41+
3442
containers:
3543
- name: epp
36-
image: {{ .Values.extension.image }}
37-
imagePullPolicy: IfNotPresent
44+
image: {{ if .Values.eppAware.enabled }}
45+
{{ default .Values.extension.image .Values.eppAware.eppImage }}
46+
{{ else }}
47+
{{ .Values.extension.image }}
48+
{{ end }}
49+
imagePullPolicy: {{ .Values.epp.imagePullPolicy | default "IfNotPresent" }}
3850
args:
39-
- -poolName
40-
- "{{ .Values.model.shortName }}-pool"
41-
- "-poolNamespace"
42-
- "{{ .Release.Namespace }}"
43-
- -v
44-
- "4"
45-
- --zap-encoder
46-
- "json"
47-
- -grpcPort
48-
- "9002"
49-
- -grpcHealthPort
50-
- "9003"
51+
{{- if .Values.epp.argsOverride }}
52+
{{- toYaml .Values.epp.argsOverride | nindent 8 }}
53+
{{- else }}
54+
- -poolName
55+
- "{{ .Values.model.shortName }}-pool"
56+
- -poolNamespace
57+
- "{{ .Release.Namespace }}"
58+
- -v
59+
- "4"
60+
- --zap-encoder
61+
- "json"
62+
- -grpcPort
63+
- "9002"
64+
- -grpcHealthPort
65+
- "9003"
66+
{{- end }}
67+
env:
68+
{{- range .Values.epp.extraEnv }}
69+
- name: {{ .name }}
70+
value: {{ .value | quote }}
71+
{{- end }}
5172
ports:
5273
- containerPort: 9002
5374
- containerPort: 9003
@@ -64,4 +85,26 @@ spec:
6485
port: 9003
6586
service: inference-extension
6687
initialDelaySeconds: 5
67-
periodSeconds: 10
88+
periodSeconds: 10
89+
90+
{{- if .Values.eppAware.enabled }}
91+
- name: {{ .Values.eppAware.sidecar.name }}
92+
image: {{ .Values.eppAware.sidecar.image }}
93+
imagePullPolicy: {{ .Values.eppAware.sidecar.imagePullPolicy | default "IfNotPresent" }}
94+
command: {{- toYaml .Values.eppAware.sidecar.command | nindent 8 }}
95+
args: {{- toYaml .Values.eppAware.sidecar.args | nindent 8 }}
96+
env:
97+
{{- range .Values.eppAware.sidecar.env }}
98+
{{- if .valueFromDynamoNamespace }}
99+
- name: {{ .name }}
100+
value: "{{ $.Values.dynamoNamespace }}"
101+
{{- else }}
102+
- name: {{ .name }}
103+
value: {{ .value | quote }}
104+
{{- end }}
105+
{{- end }}
106+
ports:
107+
{{- toYaml .Values.eppAware.sidecar.ports | nindent 8 }}
108+
resources:
109+
{{- toYaml .Values.eppAware.sidecar.resources | nindent 10 }}
110+
{{- end }}

deploy/inference-gateway/helm/dynamo-gaie/templates/http-router.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ apiVersion: gateway.networking.k8s.io/v1
1818
kind: HTTPRoute
1919
metadata:
2020
name: {{ .Values.model.shortName }}-route
21+
namespace: {{ .Release.Namespace }}
2122
spec:
2223
parentRefs:
2324
- group: gateway.networking.k8s.io
@@ -28,6 +29,7 @@ spec:
2829
- group: inference.networking.x-k8s.io
2930
kind: InferencePool
3031
name: {{ .Values.model.shortName }}-pool
32+
namespace: {{ .Release.Namespace }}
3133
port: {{ .Values.inferencePool.port }}
3234
weight: 1
3335
matches:

deploy/inference-gateway/helm/dynamo-gaie/values.yaml

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,5 +49,49 @@ httpRoute:
4949
request: "300s"
5050

5151
extension:
52-
# the GAIE extension
53-
image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:v0.4.0
52+
# default (non-epp-aware) EPP image for the GAIE extension
53+
image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:v0.4.0
54+
55+
# generic knobs you may want in both modes
56+
imagePullSecrets: [] # e.g. ["docker-imagepullsecret"]
57+
epp:
58+
imagePullPolicy: IfNotPresent
59+
# Add env in name/value pairs
60+
extraEnv: [] # e.g. [{name: USE_STREAMING, value: "true"}]
61+
# If you ever want to completely override args, supply a list here.
62+
# When empty, chart will render sane defaults
63+
argsOverride: []
64+
65+
# epp-aware mode toggle + specific settings
66+
eppAware:
67+
enabled: false
68+
# Optional: override EPP image when epp-aware=true
69+
eppImage: docker.io/lambda108/epp-inference-extension-dynamo:1.0.0
70+
71+
# Sidecar (frontend-router)
72+
sidecar:
73+
# Container name for the sidecar
74+
name: frontend-router
75+
# Sidecar image
76+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.4.1
77+
# Image pull policy for the sidecar
78+
imagePullPolicy: IfNotPresent
79+
# Command and args for running the frontend in router mode.
80+
command: ["/bin/sh", "-c"]
81+
args: ["python3 -m dynamo.frontend --http-port 8000 --router-mode kv"]
82+
# Environment variables for the sidecar.
83+
env:
84+
- name: DYNAMO_NAMESPACE
85+
valueFromDynamoNamespace: true
86+
- name: ETCD_ENDPOINTS
87+
value: "http://dynamo-platform-etcd:2379"
88+
- name: NATS_SERVER
89+
value: "nats://dynamo-platform-nats:4222"
90+
# Resource requests/limits for the sidecar container.
91+
resources:
92+
requests:
93+
cpu: "1"
94+
memory: "2Gi"
95+
# Ports exposed by the sidecar container.
96+
ports:
97+
- containerPort: 8000
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
eppAware:
17+
enabled: true
18+
eppImage: docker.io/lambda108/epp-inference-extension-dynamo:1.0.0
19+
20+
imagePullSecrets:
21+
- docker-imagepullsecret
22+
23+
epp:
24+
extraEnv:
25+
- name: USE_STREAMING
26+
value: "true"

docs/guides/dynamo_deploy/quickstart.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,7 @@ helm install dynamo-crds ./crds/ \
151151
***Step 2: Build Dependencies and Install Platform**
152152

153153
```bash
154+
cd deploy/cloud/helm
154155
helm dep build ./platform/
155156

156157
kubectl create namespace ${NAMESPACE}

0 commit comments

Comments
 (0)