Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Create gpu-provisioner helm values template #122

Merged
merged 2 commits into from
May 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 16 additions & 15 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -152,21 +152,6 @@ vet: ## Run go vet against code.
lint: $(GOLANGCI_LINT)
$(GOLANGCI_LINT) run -v

## --------------------------------------
## Release
## To create a release, run `make release VERSION=x.y.z`
## --------------------------------------
.PHONY: release-manifest
release-manifest:
@sed -i -e 's/^VERSION ?= .*/VERSION ?= ${VERSION}/' ./Makefile
@sed -i -e "s/version: .*/version: ${IMG_TAG}/" ./charts/gpu-provisioner/Chart.yaml
@sed -i -e "s/appVersion: .*/appVersion: ${IMG_TAG}/" ./charts/gpu-provisioner/Chart.yaml
@sed -i -e "s/tag: .*/tag: ${IMG_TAG}/" ./charts/gpu-provisioner/values.yaml
@sed -i -e 's/gpu-provisioner: .*/gpu-provisioner:${IMG_TAG}/' ./charts/gpu-provisioner/README.md
git checkout -b release-${VERSION}
git add ./Makefile ./charts/gpu-provisioner/Chart.yaml ./charts/gpu-provisioner/values.yaml ./charts/gpu-provisioner/README.md
git commit -s -m "release: update manifest and helm charts for ${VERSION}"

## --------------------------------------
## Tests
## --------------------------------------
Expand All @@ -188,3 +173,19 @@ e2etests: ## Run the e2e suite against your local cluster
--ginkgo.timeout=${TEST_TIMEOUT} \
--ginkgo.grace-period=3m \
--ginkgo.vv

## --------------------------------------
## Release
## To create a release, run `make release VERSION=x.y.z`
## --------------------------------------
.PHONY: release-manifest
release-manifest:
@sed -i -e 's/^VERSION ?= .*/VERSION ?= ${VERSION}/' ./Makefile
@sed -i -e "s/version: .*/version: ${IMG_TAG}/" ./charts/gpu-provisioner/Chart.yaml
@sed -i -e "s/appVersion: .*/appVersion: ${IMG_TAG}/" ./charts/gpu-provisioner/Chart.yaml
@sed -i -e "s/tag: .*/tag: ${IMG_TAG}/" ./charts/gpu-provisioner/values.yaml
@sed -i -e 's/gpu-provisioner: .*/gpu-provisioner:${IMG_TAG}/' ./charts/gpu-provisioner/README.md
@sed -i -e 's/CHART_VERSION=.*/CHART_VERSION=${IMG_TAG}/' ./charts/gpu-provisioner/README.md
git checkout -b release-${VERSION}
git add ./Makefile ./charts/gpu-provisioner/Chart.yaml ./charts/gpu-provisioner/values.yaml ./charts/gpu-provisioner/README.md
git commit -s -m "release: update manifest and helm charts for ${VERSION}"
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ gpu-Provisioner is an [Azure Karpenter provider](https://github.com/Azure/karpen
It implements the cloud provider interfaces to realize the following abstraction:
`machine` -> `AKS agent pool` (with vmss and a hard limit of VM count to 1)

```
VERSION=v0.2.0 make docker-build
make az-identity-perm
make az-patch-helm
helm install gpu-provisioner /charts/gpu-provisioner --namespace gpu-provisioner --create-namespace
make az-federated-credential
```
You should have a running controller in `gpu-provisioner` namespace.
## Prerequisites
- An Azure subscription.
- An AKS cluster with [OIDC](https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer) addon installed. Please refer to the [Karpenter installation guide](https://karpenter.sh/docs/installation/) for more details.
-
## Install gpu-provisioner

Please check the installation guidance [here](./charts/gpu-provisioner/README.md).

```shell
## How to test
After deploying the controller successfully, one can apply the yaml in `/examples` to create a machine CR. A real node will be created and added to the cluster by the controller.

Expand Down
17 changes: 15 additions & 2 deletions charts/gpu-provisioner/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,20 @@ A Helm chart for gpu-provisioner
To install the chart with the release name `gpu-provisioner`:

```bash
helm install gpu-provisioner ./charts/gpu-provisioner --namespace gpu-provisioner --create-namespace
export CHART_VERSION=0.2.0
export CLUSTER_NAME=my-cluster
export AZURE_RESOURCE_GROUP=my-rg
export AZURE_SUBSCRIPTION_ID=my-subscription-id
export MSI_NAME=gpuIdentity

az identity create --name $MSI_NAME --resource-group $CLUSTER_NAME

./hack/deploy/configure-helm-values.sh $CLUSTER_NAME $AZURE_RESOURCE_GROUP $MSI_NAME

helm install gpu-provisioner \
https://github.com/Azure/gpu-provisioner/raw/gh-pages/charts/gpu-provisioner-$CHART_VERSION.tgz \
--values gpu-provisioner-values.yaml --namespace gpu-provisioner --create-namespace --wait
make az-federated-credential
```

## Values
Expand Down Expand Up @@ -47,7 +60,7 @@ helm install gpu-provisioner ./charts/gpu-provisioner --namespace gpu-provisione
| podLabels | object | `{}` | Additional labels for the pod. |
| podSecurityContext | object | `{"fsGroup":1000}` | SecurityContext for the pod. |
| priorityClassName | string | `"system-cluster-critical"` | PriorityClass name for the pod. |
| replicas | int | `2` | Number of replicas. |
| replicas | int | `1` | Number of replicas. |
| revisionHistoryLimit | int | `10` | The number of old ReplicaSets to retain to allow rollback. |
| serviceAccount.annotations | object | `{}` | Additional annotations for the ServiceAccount. |
| serviceAccount.create | bool | `true` | Specifies if a ServiceAccount should be created. |
Expand Down
23 changes: 23 additions & 0 deletions gpu-provisioner-values-template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@

replicas: 1 # for better debugging experience
controller:
env:
# Azure client settings
- name: ARM_SUBSCRIPTION_ID
value: ${AZURE_SUBSCRIPTION_ID}
- name: LOCATION
value: ${AZURE_LOCATION}
- name: AZURE_CLUSTER_NAME
value: ${CLUSTER_NAME}
- name: AZURE_NODE_RESOURCE_GROUP
value: ${AZURE_RESOURCE_GROUP_MC}
- name: ARM_RESOURCE_GROUP
value: ${AZURE_RESOURCE_GROUP}
- name: LEADER_ELECT # disable leader election for better debugging experience
value: "false"
- name: E2E_TEST_MODE
value: "false"

workloadIdentity:
clientId: ${GPU_PROVISIONER_USER_ASSIGNED_CLIENT_ID}
tenantId: ${AZURE_TENANT_ID}
34 changes: 34 additions & 0 deletions hack/deploy/configure-helm-values.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
// https://github.com/Azure/karpenter-provider-azure/blob/2beb773cbd3134eeabb8c96b72a130b86b1a91e1/hack/deploy/configure-values.sh

#!/usr/bin/env bash
set -euo pipefail

# This script interrogates the AKS cluster and Azure resources to generate
# the gpu-provisioner-values.yaml file using the gpu-provisioner-values-template.yaml file as a template.

if [ "$#" -ne 3 ]; then
echo "Usage: $0 <cluster-name> <resource-group> <gpu-provisioner-user-assigned-identity-name>"
exit 1
fi

echo "Configuring gpu-provisioner-values.yaml for cluster $1 in resource group $2 ..."

CLUSTER_NAME=$1
AZURE_RESOURCE_GROUP=$2
AZURE_GPU_PROVISIONER_USER_ASSIGNED_IDENTITY_NAME=$3

AKS_JSON=$(az aks show --name "$CLUSTER_NAME" --resource-group "$AZURE_RESOURCE_GROUP")
AZURE_LOCATION=$(jq -r ".location" <<< "$AKS_JSON")
AZURE_RESOURCE_GROUP_MC=$(jq -r ".nodeResourceGroup" <<< "$AKS_JSON")
AZURE_TENANT_ID=$(az account show |jq -r ".tenantId")


GPU_PROVISIONER_USER_ASSIGNED_CLIENT_ID=$(az identity show --resource-group "${AZURE_RESOURCE_GROUP}" --name "${AZURE_GPU_PROVISIONER_USER_ASSIGNED_IDENTITY_NAME}" --query 'clientId' -otsv)

export CLUSTER_NAME AZURE_LOCATION AZURE_RESOURCE_GROUP_MC GPU_PROVISIONER_USER_ASSIGNED_CLIENT_ID AZURE_TENANT_ID

# get gpu-provisioner-values-template.yaml, if not already present (e.g. outside of repo context)
if [ ! -f gpu-provisioner-values-template.yaml ]; then
curl -sO https://raw.githubusercontent.com/Azure/gpu-provisioner/main/gpu-provisioner-values-template.yaml
fi
yq '(.. | select(tag == "!!str")) |= envsubst(nu)' gpu-provisioner-values-template.yaml > gpu-provisioner-values.yaml
Loading