Skip to content

Commit 28dfcdb

Browse files
committed
Add inference extension doc
1 parent ff68618 commit 28dfcdb

File tree

1 file changed

+210
-0
lines changed

1 file changed

+210
-0
lines changed
Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
---
2+
title: Gateway API Inference Extension
3+
weight: 800
4+
toc: true
5+
nd-type: how-to
6+
nd-product: NGF
7+
nd-docs: DOCS-0000
8+
---
9+
10+
Learn how to use NGINX Gateway Fabric with the Gateway API Inference Extension to optimize traffic routing to self-hosting Generative AI Models on Kubernetes.
11+
12+
## Overview
13+
14+
The [Gateway API Inference Extension](https://gateway-api-inference-extension.sigs.k8s.io/) is an official Kubernetes project that aims to provide optimized load-balancing for self-hosted Generative AI Models on Kubernetes.
15+
The project's goal is to improve and standardize routing to inference workloads across the ecosystem.
16+
17+
Coupled with the provided Endpoint Picker Service, NGINX Gateway Fabric becomes an [Inference Gateway](https://gateway-api-inference-extension.sigs.k8s.io/#concepts-and-definitions), with additional AI specific traffic management features such as model-aware routing, serving priority for models, model rollouts, and more.
18+
19+
{{< call-out "warning" >}} The Gateway API Inference Extension is still in alpha status and should not be used in production yet.{{< /call-out >}}
20+
21+
## Set up
22+
23+
Install the Gateway API Inference Extension CRDs:
24+
25+
```shell
26+
kubectl kustomize "https://github.com/nginx/nginx-gateway-fabric/config/crd/inference-extension/?ref=v{{< version-ngf >}}" | kubectl apply -f -
27+
```
28+
29+
To enable the Gateway API Inference Extension, [install]({{< ref "/ngf/install/" >}}) NGINX Gateway Fabric with these modifications:
30+
31+
- Using Helm: set the `nginxGateway.gwAPIInferenceExtension.enable=true` Helm value.
32+
- Using Kubernetes manifests: set the `--gateway-api-inference-extension` flag in the nginx-gateway container argument, update the ClusterRole RBAC to add the `inferencepools`:
33+
34+
```yaml
35+
- apiGroups:
36+
- inference.networking.k8s.io
37+
resources:
38+
- inferencepools
39+
verbs:
40+
- get
41+
- list
42+
- watch
43+
- apiGroups:
44+
- inference.networking.k8s.io
45+
resources:
46+
- inferencepools/status
47+
verbs:
48+
- update
49+
```
50+
51+
See this [example manifest](https://raw.githubusercontent.com/nginx/nginx-gateway-fabric/main/deploy/inference/deploy.yaml) for clarification.
52+
53+
54+
## Deploy a sample model server
55+
56+
The [vLLM simulator](https://github.com/llm-d/llm-d-inference-sim/tree/main) model server does not use GPUs and is ideal for test/development environments. This sample is configured to simulate the [meta-llama/LLama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model. To deploy the vLLM simulator, run the following command:
57+
58+
```shell
59+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/sim-deployment.yaml
60+
```
61+
62+
## Deploy the InferencePool and Endpoint Picker Extension
63+
64+
The InferencePool is a Gateway API Inference Extension resource that represents a set of Inference-focused Pods. With InferencePool, you can configure a routing extension as well as inference-specific routing optimizations. For more information on this resource, refer to the Gateway API Inference Extension [InferencePool documentation](https://gateway-api-inference-extension.sigs.k8s.io/api-types/inferencepool/).
65+
66+
Install an InferencePool named `vllm-llama3-8b-instruct` that selects from endpoints with label `app: vllm-llama3-8b-instruct` and listening on port 8000. The Helm install command automatically installs the Endpoint Picker Extension and InferencePool.
67+
68+
NGINX will query the Endpoint Picker Extension to determine the appropriate pod endpoint to route traffic to. These pods are selected from a pool of ready pods designated by the assigned InferencePool's Selector field. For more information on the [Endpoint Picker](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/README.md).
69+
70+
{{< call-out "warning" >}} The Endpoint Picker Extension is a third-party application written and provided by the Gateway API Inference Extension project. Communication between NGINX and the Endpoint Picker uses TLS with certificate verification disabled by default, as the Endpoint Picker does not currently support mounting CA certificates. The Gateway API Inference Extension is in alpha status and should not be used in production. NGINX Gateway Fabric is not responsible for any threats or risks associated with using this third-party Endpoint Picker Extension application. {{< /call-out >}}
71+
72+
```shell
73+
export IGW_CHART_VERSION=v1.0.1
74+
helm install vllm-llama3-8b-instruct \
75+
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
76+
--version $IGW_CHART_VERSION \
77+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
78+
```
79+
80+
Confirm that the Endpoint Picker was deployed and is running:
81+
82+
```shell
83+
kubectl describe deployment vllm-llama3-8b-instruct-epp
84+
```
85+
86+
## Deploy an Inference Gateway
87+
88+
```yaml
89+
kubectl apply -f - <<EOF
90+
apiVersion: gateway.networking.k8s.io/v1
91+
kind: Gateway
92+
metadata:
93+
name: inference-gateway
94+
spec:
95+
gatewayClassName: nginx
96+
listeners:
97+
- name: http
98+
port: 80
99+
protocol: HTTP
100+
EOF
101+
```
102+
103+
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
104+
105+
```shell
106+
kubectl describe gateway inference-gateway
107+
```
108+
109+
Save the public IP address and port of the NGINX Service into shell variables:
110+
111+
```text
112+
GW_IP=XXX.YYY.ZZZ.III
113+
GW_PORT=<port number>
114+
```
115+
116+
## Deploy a HTTPRoute
117+
118+
```yaml
119+
kubectl apply -f - <<EOF
120+
apiVersion: gateway.networking.k8s.io/v1
121+
kind: HTTPRoute
122+
metadata:
123+
name: llm-route
124+
spec:
125+
parentRefs:
126+
- group: gateway.networking.k8s.io
127+
kind: Gateway
128+
name: inference-gateway
129+
rules:
130+
- backendRefs:
131+
- group: inference.networking.k8s.io
132+
kind: InferencePool
133+
name: vllm-llama3-8b-instruct
134+
port: 3000
135+
matches:
136+
- path:
137+
type: PathPrefix
138+
value: /
139+
EOF
140+
```
141+
142+
Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
143+
144+
```shell
145+
kubectl describe httproute llm-route
146+
```
147+
148+
## Try it out
149+
150+
Send traffic to the Gateway:
151+
152+
```shell
153+
curl -i $GW_IP:$GW_PORT/v1/completions -H 'Content-Type: application/json' -d '{
154+
"model": "food-review-1",
155+
"prompt": "Write as if you were a critic: San Francisco",
156+
"max_tokens": 100,
157+
"temperature": 0
158+
}'
159+
```
160+
161+
## Cleanup
162+
163+
Uninstall the InferencePool, InferenceObjective, and model server resources:
164+
165+
166+
```shell
167+
helm uninstall vllm-llama3-8b-instruct
168+
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferenceobjective.yaml --ignore-not-found
169+
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/cpu-deployment.yaml --ignore-not-found
170+
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/gpu-deployment.yaml --ignore-not-found
171+
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/sim-deployment.yaml --ignore-not-found
172+
```
173+
174+
Uninstall the Gateway API Inference Extension CRDs:
175+
176+
```shell
177+
kubectl delete -k https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd --ignore-not-found
178+
```
179+
180+
Uninstall Inference Gateway and HTTPRoute:
181+
182+
```shell
183+
kubectl delete gateway inference-gateway
184+
kubectl delete httproute llm-route
185+
```
186+
187+
Uninstall NGINX Gateway Fabric:
188+
189+
```shell
190+
helm uninstall ngf -n nginx-gateway
191+
```
192+
If needed, replace ngf with your chosen release name.
193+
194+
Remove namespace and NGINX Gateway Fabric CRDs:
195+
196+
```shell
197+
kubectl delete ns nginx-gateway
198+
kubectl delete -f https://raw.githubusercontent.com/nginx/nginx-gateway-fabric/v{{< version-ngf >}}/deploy/crds.yaml
199+
```
200+
201+
Remove the Gateway API CRDs:
202+
203+
{{< include "/ngf/installation/uninstall-gateway-api-resources.md" >}}
204+
205+
## See also
206+
207+
- [Gateway API Inference Exntension Introduction](https://gateway-api-inference-extension.sigs.k8s.io/): for introductory details to the project.
208+
- [Gateway API Inference Extension API Overview](https://gateway-api-inference-extension.sigs.k8s.io/concepts/api-overview/): for an API overview.
209+
- [Gateway API Inference Extension User Guides](https://gateway-api-inference-extension.sigs.k8s.io/guides/): for additional use cases and guides.
210+

0 commit comments

Comments
 (0)