|
1 | 1 | ## Inference Gateway Setup with Dynamo |
2 | 2 |
|
3 | | -This Setup treats each Dynamo deployment as a black box and routes traffic randomly among the deployments. |
4 | | -Currently, this setup is only kgateway based Inference Gateway. |
| 3 | +This guide demonstrates two setups. |
| 4 | +The EPP-unaware setup treats each Dynamo deployment as a black box and routes traffic randomly among the deployments. |
| 5 | +The EPP-aware setup first uses Dynamo Router to pick the worker instance id for serving the model. Then traffic gets directed straight to the selected worker. |
| 6 | +Currently, these setups are only supported with the kGateway based Inference Gateway. |
5 | 7 |
|
6 | 8 | ## Table of Contents |
7 | 9 |
|
@@ -39,7 +41,7 @@ GATEWAY_API_VERSION=v1.3.0 |
39 | 41 | kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/$GATEWAY_API_VERSION/standard-install.yaml |
40 | 42 | ``` |
41 | 43 |
|
42 | | -b. Install the Inference Extension CRDs (Inferenece Model and Inference Pool CRDs) |
| 44 | +b. Install the Inference Extension CRDs (Inference Model and Inference Pool CRDs) |
43 | 45 | ```bash |
44 | 46 | INFERENCE_EXTENSION_VERSION=v0.5.1 |
45 | 47 | kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/$INFERENCE_EXTENSION_VERSION/manifests.yaml -n my-model |
@@ -84,13 +86,39 @@ kubectl apply -f agg.yaml -n my-model |
84 | 86 |
|
85 | 87 | The Inference Gateway is configured through the `inference-gateway-resources.yaml` file. |
86 | 88 |
|
87 | | -Deploy the Inference Gateway resources to your Kubernetes cluster: |
| 89 | +Deploy the Inference Gateway resources to your Kubernetes cluster by running one of the commands below. |
| 90 | + |
| 91 | +For the EPP-unaware black box integration run: |
88 | 92 |
|
89 | 93 | ```bash |
90 | 94 | cd deploy/inference-gateway |
91 | 95 | helm install dynamo-gaie ./helm/dynamo-gaie -n my-model -f ./vllm_agg_qwen.yaml |
92 | 96 | ``` |
93 | 97 |
|
| 98 | +For the EPP-aware integration run: |
| 99 | + |
| 100 | +```bash |
| 101 | +cd deploy/inference-gateway |
| 102 | + |
| 103 | +helm install dynamo-gaie ./helm/dynamo-gaie \ |
| 104 | + -n my-model \ |
| 105 | + -f ./vllm_agg_qwen.yaml \ |
| 106 | + -f ./values-epp-aware.yaml |
| 107 | +``` |
| 108 | + |
| 109 | +Or customize the EPP further using flags, i.e: |
| 110 | + |
| 111 | +```bash |
| 112 | +helm install dynamo-gaie ./helm/dynamo-gaie \ |
| 113 | + -n my-model \ |
| 114 | + -f ./vllm_agg_qwen.yaml \ |
| 115 | + --set eppAware.enabled=true \ |
| 116 | + --set eppAware.eppImage=docker.io/lambda108/epp-inference-extension-dynamo:1.0.0 \ |
| 117 | + --set imagePullSecrets='{docker-imagepullsecret}' \ |
| 118 | + --set-string epp.extraEnv[0].name=USE_STREAMING \ |
| 119 | + --set-string epp.extraEnv[0].value=true |
| 120 | +``` |
| 121 | + |
94 | 122 | Key configurations include: |
95 | 123 | - An InferenceModel resource for the Qwen model |
96 | 124 | - A service for the inference gateway |
|
0 commit comments