Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stress testing] Update docs, resiliency for deploy script #1918

Merged
3 commits merged into from
Aug 25, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 17 additions & 12 deletions eng/common/scripts/stress-testing/deploy-stress-tests.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ $ErrorActionPreference = 'Stop'
. $PSScriptRoot/find-all-stress-packages.ps1
$FailedCommands = New-Object Collections.Generic.List[hashtable]

if (!(Get-Module powershell-yaml)) {
if (!(Get-Module -ListAvailable powershell-yaml)) {
benbp marked this conversation as resolved.
Show resolved Hide resolved
Write-Host "Installing powershell-yaml module..."
Install-Module -Name powershell-yaml -RequiredVersion 0.4.1 -Force -Scope CurrentUser
}

Expand Down Expand Up @@ -51,7 +52,10 @@ function Login([string]$subscription, [string]$clusterGroup, [boolean]$pushImage
RunOrExitOnFailure az login --allow-no-subscriptions
}

$clusterName = (az aks list -g $clusterGroup -o json| ConvertFrom-Json).name
# Discover cluster name, only one cluster per group is expected
Write-Host "Listing AKS cluster in $subscription/$clusterGroup"
$cluster = RunOrExitOnFailure az aks list -g $clusterGroup --subscription $subscription -o json
$clusterName = ($cluster | ConvertFrom-Json).name

RunOrExitOnFailure az aks get-credentials `
-n "$clusterName" `
Expand All @@ -60,8 +64,9 @@ function Login([string]$subscription, [string]$clusterGroup, [boolean]$pushImage
--overwrite-existing

if ($pushImages) {
$registry = (az acr list -g $clusterGroup -o json | ConvertFrom-Json).name
RunOrExitOnFailure az acr login -n $registry
$registry = RunOrExitOnFailure az acr list -g $clusterGroup --subscription $subscription -o json
$registryName = ($registry | ConvertFrom-Json).name
RunOrExitOnFailure az acr login -n $registryName
}
}

Expand Down Expand Up @@ -110,11 +115,8 @@ function DeployStressPackage(
[string]$repository,
[boolean]$pushImages
) {
$registry = (az acr list -g $clusterGroup -o json | ConvertFrom-Json).name
if (!$registry) {
Write-Host "Could not find container registry in resource group $clusterGroup"
exit 1
}
$registry = RunOrExitOnFailure az acr list -g $clusterGroup --subscription $subscription -o json
$registryName = ($registry | ConvertFrom-Json).name

Run helm dependency update $pkg.Directory
if ($LASTEXITCODE) { return }
Expand All @@ -133,7 +135,7 @@ function DeployStressPackage(
if (!$imageName) {
$imageName = $dockerFile.Directory.Name
}
$imageTag = "${registry}.azurecr.io/$($repository.ToLower())/$($imageName):$deployId"
$imageTag = "${registryName}.azurecr.io/$($repository.ToLower())/$($imageName):$deployId"
Write-Host "Building and pushing stress test docker image '$imageTag'"
Run docker build -t $imageTag -f $dockerFile.FullName $dockerFile.DirectoryName
if ($LASTEXITCODE) { return }
Expand All @@ -154,7 +156,7 @@ function DeployStressPackage(
Run helm upgrade $pkg.ReleaseName $pkg.Directory `
-n $pkg.Namespace `
--install `
--set repository=$registry.azurecr.io/$repository `
--set repository=$registryName.azurecr.io/$repository `
--set tag=$deployId `
--set stress-test-addons.env=$environment
if ($LASTEXITCODE) {
Expand All @@ -176,4 +178,7 @@ function DeployStressPackage(
Run kubectl label secret -n $pkg.Namespace --overwrite $helmReleaseConfig deployId=$deployId
}

DeployStressTests @PSBoundParameters
# Don't call functions when the script is being dot sourced
if ($MyInvocation.InvocationName -ne ".") {
DeployStressTests @PSBoundParameters
}
124 changes: 58 additions & 66 deletions tools/stress-cluster/chaos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,9 @@ You will need the following tools to create and run tests:

1. [Docker](https://docs.docker.com/get-docker/)
1. [Kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl)
1. [Helm](https://helm.sh/)
1. [Helm](https://helm.sh/docs/intro/install/)
1. [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)
1. [Powershell Core](https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell-core-on-linux?view=powershell-7.1#ubuntu-2004) (if using Linux)

## Access

Expand All @@ -56,7 +57,7 @@ kubectl get namespaces

## Quick Testing with no Dependencies

This section details how to deploy a simple job, without any dependencies on the cluster (e.g. azure credentials, app insights keys).
This section details how to deploy a simple job, without any dependencies on the cluster (e.g. azure credentials, app insights keys) or stress test scripts. It is used to illustrate how kubernetes and the tools work only. Stress test development should be done using the [deploy script](https://github.com/Azure/azure-sdk-tools/blob/main/eng/common/scripts/stress-testing/deploy-stress-tests.ps1).

To get started, you will need to create a container image containing your long-running test, and a manifest to execute that image as a [kubernetes job](https://kubernetes.io/docs/concepts/workloads/controllers/job/).

Expand Down Expand Up @@ -161,6 +162,18 @@ The basic layout for a stress test is the following (see `examples/stress_deploy
<bicep modules> # Any additional bicep module files/directories referenced by test-resources.bicep
```

### Stress Test Metadata

A stress test package should follow a few conventions that are used by the automation to auto-discover behavior.

Fields in `Chart.yaml`
1. The `name` field will get used as the helm release name. To deploy instances of the same stress test release in parallel, update this field.
1. The `annotations.stressTest` field must be set to true for the script to discover the test.
1. The `annotations.namespace` field must be set, and governs which kubernetes namespace the stress test package will be
installed into as a helm release.
1. Extra fields in `annotations` can be set arbitrarily, and used via the `-Filters` argument to the [stress test deploy
script](https://github.com/Azure/azure-sdk-tools/blob/main/eng/common/scripts/stress-testing/deploy-stress-tests.ps1).

### Stress Test Secrets

For ease of implementation regarding merging secrets from various Keyvault sources, secret values injected into the stress
Expand Down Expand Up @@ -192,31 +205,33 @@ a `chart/test-resources.json` file in place before running `helm install`.
The stress test cluster and config boilerplate will handle running ARM deployments in an init container before
stress test container startup.

If using Azure Bicep files, they should be declared at the subscription `targetScope`, as opposed to the default
resource group scope. Additionally, they should create a resource group for the test, along with tags marking deletion
for the group after the intended duration of the stress test.

The bicep file should output at least the resource group name, which will be injected into the stress test env file.

```
targetScope = 'subscription'
// Dummy parameter to handle defaults the script passes in
param testApplicationOid string = ''

param groupName string
param location string
param now string = utcNow('u')

resource group 'Microsoft.Resources/resourceGroups@2020-10-01' = {
name: 'rg-stress-${groupName}-${uniqueString(now)}'
location: location
tags: {
DeleteAfter: dateTimeAdd(now, 'PT8H')
}
resource config 'Microsoft.AppConfiguration/configurationStores@2020-07-01-preview' = {
name: 'config-${resourceGroup().name}'
location: resourceGroup().location
sku: {
name: 'Standard'
}
}

output RESOURCE_GROUP string = group.name
output RESOURCE_GROUP string = resourceGroup().name
output AZURE_CLIENT_OID string = testApplicationOid
```

See the [Job Manifest section](#job-manifest) for an example spec containing config template includes for resource auto-deployment.
A stress test package must include a `parameters.json` file as well, which can either be empty or contain parameters:

```
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": { }
}
```

### Helm Chart Dependencies

Expand Down Expand Up @@ -261,14 +276,14 @@ spec:
containers:
- name: deployment-example
image: mcr.microsoft.com/azure-cli
{{- include "stress-test-addons.container-env" . | nindent 10 }}
command: ['bash', '-c']
args:
- |
source $ENV_FILE &&
az login --service-principal -u $AZURE_CLIENT_ID -p $AZURE_CLIENT_SECRET --tenant $AZURE_TENANT_ID &&
az account set -s $AZURE_SUBSCRIPTION_ID &&
az group show -g $RESOURCE_GROUP -o json
{{- include "stress-test-addons.container-env" . | nindent 6 }}
{{- end -}}
```

Expand Down Expand Up @@ -342,52 +357,32 @@ The underlying `stress-test-addons` helm library will handle a scenarios list au

## Deploying a Stress Test

To build and deploy the stress test, first log in to access the cluster resources if not already set up:
The stress test deployment is best run via the [stress test deploy
script](https://github.com/Azure/azure-sdk-tools/blob/main/eng/common/scripts/stress-testing/deploy-stress-tests.ps1).
This script handles: cluster and container registry access, building the stress test helm package, installing helm
package dependencies, and building and pushing docker images. The script must be run via powershell or powershell core.

```
az login
# Log in to the container registry for Docker access
az acr login -n stresstestregistry
# Download the kubeconfig for the cluster
az aks get-credentials -g rg-stress-test-cluster- -n stress-test --subscription 'Azure SDK Test Resources'
```
If using bash or another linux terminal, a [powershell core](https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell-core-on-linux?view=powershell-7.1#ubuntu-2004) shell can be invoked via `pwsh`.

Then register the helm repository (this only needs to be done once):
The first invocation of the script must be run with the `-Login` flag to set up cluster and container registry access.

```
helm repo add stress-test-charts https://stresstestcharts.blob.core.windows.net/helm/
helm repo update
```

Then build/publish images and build ARM templates. Make sure the docker image matches what's referenced in the helm templates.
cd <stress test search directory>

```
# Build and publish image
docker build . -t stresstestregistry.azurecr.io/<your name>/<test job image name>:<version>
docker push stresstestregistry.azurecr.io/<your name>/<test job image name>:<version>

# Compile ARM template (if using Bicep files)
az bicep build -f ./test-resources.bicep

# Install helm dependencies
helm dependency update
<repo root>/eng/common/scripts/stress-testing/deploy-stress-tests.ps1 `
-Login `
-PushImages `
-Repository <your name> `
-DeployId <tag for scoping>
```

Then install the stress test into the cluster:
To re-deploy more quickly, the script can be run without `-Login` and/or without `-PushImages` (if no code changes were
made).

```
kubectl create namespace <your stress test namespace>
kubectl label namespace <namespace> owners=<owner alias>
helm install -n <your stress test namespace> <stress test name> .
```

To install into a different cluster (test, prod, or dev):

```
az aks get-credentials --subscription '<cluster subscription>' -g rg-stress-test-cluster-<cluster suffix> -n stress-test
kubectl create namespace <your stress test namespace>
kubectl label namespace <namespace> owners=<owner alias>
helm install -n <your stress test namespace> <stress test name> . --set stress-test-addons.env=<cluster suffix>
<repo root>/eng/common/scripts/stress-testing/deploy-stress-tests.ps1 `
-Repository <your name> `
-DeployId <tag for scoping>
```

You can check the progress/status of your installation via:
Expand All @@ -396,13 +391,7 @@ You can check the progress/status of your installation via:
helm list -n <stress test namespace>
```

To update/re-deploy the test with changes:

```
helm upgrade <stress test name> .
```

To debug the yaml built by `helm install`, run:
To debug the kubernetes manifests installed by the stress test, run the following from the stress test directory:

```
helm template <stress test name> .
Expand All @@ -419,13 +408,16 @@ To check the status of the stress test job resources:
```
# List stress test pods
kubectl get pods -n <stress test namespace> -l release=<stress test name>
# Get logs from azure-deployer init container

# Get logs from azure-deployer init container, if deploying resources. Omit `-c azure-deployer` to get main container
logs.
kubectl logs -n <stress test namespace> <stress test pod name> -c azure-deployer

# If empty, there may have been startup failures
kubectl describe pod -n <stress test namespace> <stress test pod name>
```

Once the `azure-deployer` init container is completed and the stress test pod is in a `Running` state,
If deploying resources, once the `azure-deployer` init container is completed and the stress test pod is in a `Running` state,
you can quick check the local logs:

```
Expand Down