|
| 1 | +# Dynamo on AKS |
| 2 | + |
| 3 | + |
| 4 | +This document covers the process of deploying Dynamo Cloud and running inference in a vLLM distributed runtime within a Azure Kubernetes environment, covering the setup process on a Azure Kubernetes Cluster, all the way from setup to testing inference. |
| 5 | + |
| 6 | + |
| 7 | +### Task 1. Infrastructure Deployment |
| 8 | + |
| 9 | +1. Open **Azure Cloud Shell** or a ternimal on an Azure VM and install pre-reqs: |
| 10 | +``` |
| 11 | +az login |
| 12 | +
|
| 13 | +az extension add --name aks-preview |
| 14 | +az extension update --name aks-preview |
| 15 | +``` |
| 16 | + |
| 17 | +generate an rsa ssh key for using with aks cluster: |
| 18 | +``` |
| 19 | +ssh-keygen -t rsa -b 4096 -C "<email@id.com>" |
| 20 | +``` |
| 21 | + |
| 22 | +2. Create AKS Cluster |
| 23 | + ``` |
| 24 | + export REGION=<region> |
| 25 | + export RESOURCE_GROUP=<rg_name> |
| 26 | + export ZONE=<zone> |
| 27 | + export CLUSTER_NAME=<aks_cluster_name> |
| 28 | + export CPU_COUNT=1 |
| 29 | +
|
| 30 | +az aks create -g $RESOURCE_GROUP -n $CLUSTER_NAME --location $REGION --zones $ZONE --node-count $CPU_COUNT --enable-node-public-ip --ssh-key-value /home/user/.ssh/id_rsa.pub |
| 31 | +``` |
| 32 | + |
| 33 | +3. Check if it was created correctly |
| 34 | +``` bash |
| 35 | +# Get Credentials |
| 36 | +az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME |
| 37 | + |
| 38 | +kubectl config get-contexts |
| 39 | + |
| 40 | +#You should see output like this: |
| 41 | +CURRENT NAME CLUSTER AUTHINFO NAMESPACE |
| 42 | +* dynamo-aks dynamo-aks clusterUser_<rg_name>_<aks_cluster_name> |
| 43 | +``` |
| 44 | + |
| 45 | +4. Create GPU node pool: You can use as many computes of whatever SKU you want, here we have used 4 nodes of standard_nc24ads_a100_v4, which have 1 A100 each. |
| 46 | +``` |
| 47 | +az aks nodepool add --resource-group $RESOURCE_GROUP --cluster-name $CLUSTER_NAME --name gpupool --node-count 4 --skip-gpu-driver-install --node-vm-size standard_nc24ads_a100_v4 --node-osdisk-size 2048 --max-pods 110 |
| 48 | +``` |
| 49 | + |
| 50 | +### Task 2. Install Nvidia GPU Operator |
| 51 | + |
| 52 | +Once your AKS cluster is configured with a GPU-enabled node pool, we can proceed with setting up the NVIDIA GPU Operator. This operator automates the deployment and lifecycle of all NVIDIA software components required to provision GPUs in the Kubernetes cluster. The NVIDIA GPU operator enables the infrastructure to support GPU workloads like LLM inference and embedding generation. |
| 53 | + |
| 54 | +1. Add the NVIDIA Helm repository: |
| 55 | +``` |
| 56 | +helm repo add nvidia https://helm.ngc.nvidia.com/nvidia --pass-credentials && helm repo update |
| 57 | +``` |
| 58 | + |
| 59 | +2. Install the GPU Operator: |
| 60 | +``` |
| 61 | +helm install --create-namespace --namespace gpu-operator nvidia/gpu-operator --wait --generate-name |
| 62 | +``` |
| 63 | + |
| 64 | +3. Validate install (Takes about 5 mins to complete): |
| 65 | +``` |
| 66 | +kubectl get pods -A -o wide |
| 67 | +``` |
| 68 | + |
| 69 | +You should see output similar to the example below. Note that this is not the complete output, there should be additional pods running. The most important thing is to verify that the GPU Operator pods are in a `Running` state. |
| 70 | + |
| 71 | +``` |
| 72 | +NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE |
| 73 | +gpu-operator gpu-operator-xxxx-node-feature-discovery-gc-xxxxxxxxx 1/1 Running 0 40s 10.244.0.194 aks-nodepool1-xxxx |
| 74 | +gpu-operator gpu-operator-xxxx-node-feature-discovery-master-xxxxxxxxx 1/1 Running 0 40s 10.244.0.200 aks-nodepool1-xxxx |
| 75 | +gpu-operator gpu-operator-xxxx-node-feature-discovery-worker-xxxxxxxxx 1/1 Running 0 40s 10.244.0.190 aks-nodepool1-xxxx |
| 76 | +gpu-operator gpu-operator-xxxxxxxxxxxxxx 1/1 Running 0 40s 10.244.0.128 aks-nodepool1-xxxx |
| 77 | +``` |
| 78 | + |
| 79 | +For additional guidance on setting up GPU node pools in AKS, refer to the [Microsoft Docs](https://learn.microsoft.com/en-us/azure/aks/gpu-cluster?tabs=add-ubuntu-gpu-node-pool). |
| 80 | + |
| 81 | +### Task 3. Configure Dynamo |
| 82 | + |
| 83 | +1. Pull Dynamo Repo |
| 84 | +The Dynamo GitHub repository will be leveraged extensively throughout this walkthrough. Pull the repository using: |
| 85 | +```bash |
| 86 | +# clone Dynamo GitHub repo |
| 87 | +git clone https://github.com/ai-dynamo/dynamo.git |
| 88 | + |
| 89 | +# go to root of Dynamo repo, latest commit at the time of writing this document was 22e6c96f715177c776421c90e9415a7dbc4f661a |
| 90 | +cd dynamo |
| 91 | +``` |
| 92 | + |
| 93 | +2. Install Dynamo from Published Artifacts on NGC (refer: https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_deploy/quickstart.md): |
| 94 | +```bash |
| 95 | +export NAMESPACE=dynamo-cloud |
| 96 | +export RELEASE_VERSION=0.3.2 |
| 97 | + |
| 98 | +#The above linked document says to authenticate using NGC_API_KEY, not neccessary, since this is an openly available container |
| 99 | + |
| 100 | +# Fetch the CRDs helm chart |
| 101 | +helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz |
| 102 | + |
| 103 | +# Fetch the platform helm chart |
| 104 | +helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz |
| 105 | + |
| 106 | +# Step 1: Install Custom Resource Definitions (CRDs) |
| 107 | +helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz \ |
| 108 | + --namespace default \ |
| 109 | + --wait \ |
| 110 | + --atomic |
| 111 | + |
| 112 | +#Step 2: Install Dynamo Platform |
| 113 | +kubectl create namespace ${NAMESPACE} |
| 114 | +helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} |
| 115 | + |
| 116 | +# Check pod status: |
| 117 | +kubectl get pods -n $NAMESPACE |
| 118 | + |
| 119 | +# output should be similar |
| 120 | +NAME READY STATUS RESTARTS AGE |
| 121 | +dynamo-platform-dynamo-operator-controller-manager-549b5d5xf7rv 2/2 Running 0 2m50s |
| 122 | +dynamo-platform-etcd-0 1/1 Running 0 2m50s |
| 123 | +dynamo-platform-nats-0 2/2 Running 0 2m50s |
| 124 | +dynamo-platform-nats-box-5dbf45c748-kln82 1/1 Running 0 2m51s |
| 125 | +``` |
| 126 | + |
| 127 | +There are other ways to install Dynamo, you can find them [here](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_deploy/quickstart.md) |
| 128 | + |
| 129 | +### Task 4. Deploy a model |
| 130 | + |
| 131 | +We're going to be deploying MSFTs Phi-3.5-vision-instruct. You can alter this flow to deploy whatever model you need. |
| 132 | + |
| 133 | +Refer: [dynamo/docs/examples/README.md at main · ai-dynamo/dynamo](https://github.com/ai-dynamo/dynamo/blob/main/docs/examples/README.md) |
| 134 | + |
| 135 | +```bash |
| 136 | +# Set your dynamo root directory |
| 137 | +cd <root-dynamo-folder> |
| 138 | +export PROJECT_ROOT=$(pwd) |
| 139 | + |
| 140 | +# Create a Kubernetes secret containing your sensitive values: |
| 141 | +export HF_TOKEN=your_hf_token |
| 142 | +kubectl create secret generic hf-token-secret --from-literal=HF_TOKEN=${HF_TOKEN} -n ${NAMESPACE} |
| 143 | + |
| 144 | +# Deploying an example (Time taken depends on model, phi3v took ~5mins) |
| 145 | +# You can edit the number os replicas of encoder/ decoder independently here to suit your deployment needs |
| 146 | + |
| 147 | +kubectl apply -f examples/multimodal/deploy/k8s/agg-phi3v.yaml -n ${NAMESPACE} |
| 148 | + |
| 149 | +# Get status of deployment |
| 150 | +kubectl get dynamoGraphDeployment -n ${NAMESPACE} |
| 151 | + |
| 152 | +# You can use any of the following commands to see logs for debugging |
| 153 | +kubectl get pods -n ${NAMESPACE} -o wide |
| 154 | +kubectl logs <pod-name> -n ${NAMESPACE} |
| 155 | +kubectl exec -it <pod-name> -n ${NAMESPACE} -- nvidia-smi |
| 156 | + |
| 157 | +# Enable Port forwarding to be able to hit a curl request |
| 158 | +kubectl get svc -n ${NAMESPACE} |
| 159 | + |
| 160 | +#Look for one that ends in -frontend and use it for port forward. |
| 161 | +SERVICE_NAME=$(kubectl get svc -n ${NAMESPACE} -o name | grep frontend | sed 's|.*/||' | sed 's|-frontend||' | head -n1) |
| 162 | +kubectl port-forward svc/${SERVICE_NAME}-frontend 8000:8000 -n ${NAMESPACE} & |
| 163 | +``` |
| 164 | + |
| 165 | +#### Task 5. Testing |
| 166 | + |
| 167 | +``` |
| 168 | +curl localhost:8000/v1/chat/completions \ |
| 169 | + -H "Content-Type: application/json" \ |
| 170 | + -d '{ |
| 171 | + "model": "microsoft/Phi-3.5-vision-instruct", |
| 172 | + "messages": [ |
| 173 | + { |
| 174 | + "role": "user", |
| 175 | + "content": [ |
| 176 | + { "type": "text", "text": "What is in this image?" }, |
| 177 | + { "type": "image_url", "image_url": { "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" } } |
| 178 | + ] |
| 179 | + } |
| 180 | + ], |
| 181 | + "stream": false |
| 182 | + }' |
| 183 | +
|
| 184 | +#Output should be something like: |
| 185 | +{"id": "a200785a-a4dd-4208-8ced-2d0ea30351a4", "object": "chat.completion", "created": 1753223375, "model": "microsoft/Phi-3.5-vision-instruct", "choices": [{"index": 0, "message": {"role": "assistant", "content": " The image features a wooden boardwalk extending into a grassy area surrounded by a wetland. There are water lilies in the water, and the sky is clear with a few clouds. The sun is shining, casting light on the scene, and there are trees visible in the background."}, "finish_reason": "stop"}]} |
| 186 | +``` |
| 187 | + |
| 188 | +## Clean Up Resources |
| 189 | + |
| 190 | +In order to clean up any Dynamo related resources, from the container shell you launched the deployment from, simply run the following command: |
| 191 | + |
| 192 | +```bash |
| 193 | +# Delete deployment |
| 194 | +kubectl delete dynamoGraphDeployment <your-dep-name> -n ${NAMESPACE} |
| 195 | + |
| 196 | +# Delete the AKS Cluster |
| 197 | +az aks delete --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP --yes |
| 198 | +``` |
| 199 | + |
| 200 | +This will spin down the Dynamo deployment we configured and spin down all the resources that were leveraged for the deployment. |
0 commit comments