diff --git a/01-prerequisites.md b/01-prerequisites.md index 143e32ca..e212fb13 100644 --- a/01-prerequisites.md +++ b/01-prerequisites.md @@ -28,14 +28,21 @@ This is the starting point for the instructions on deploying the [AKS Baseline r 1. [Register the Azure Event Grid preview feature - `EventgridPreview`](https://docs.microsoft.com/azure/aks/quickstart-event-grid#register-the-eventgridpreview-preview-feature) + 1. [Register the AKS Extensions preview feature - `AKS-ExtensionManager`](https://docs.microsoft.com/azure/aks/cluster-extensions?tabs=azure-cli#register-the-aks-extensionmanager-preview-features) + + 1. [Register the Kubernetes Configuration preview feature = `fluxConfigurations`](https://docs.microsoft.com/azure/azure-arc/kubernetes/tutorial-use-gitops-flux2#for-azure-kubernetes-service-clusters) + ```bash az feature register --namespace "Microsoft.ContainerService" -n "EventgridPreview" + az feature register --namespace "Microsoft.ContainerService" -n "AKS-ExtensionManager" + az feature register --namespace "Microsoft.KubernetesConfiguration" -n "fluxConfigurations" # Keep running until all say "Registered." (This may take up to 20 minutes.) - az feature list -o table --query "[?name=='Microsoft.ContainerService/EventgridPreview'].{Name:name,State:properties.state}" + az feature list -o table --query "[?name=='Microsoft.ContainerService/EventgridPreview' || name=='Microsoft.ContainerService/AKS-ExtensionManager' || name=='Microsoft.KubernetesConfiguration/fluxConfigurations'].{Name:name,State:properties.state}" - # When all say "Registered" then re-register the AKS resource provider + # When all say "Registered" then re-register the AKS and related resource providers az provider register --namespace Microsoft.ContainerService + az provider register --namespace Microsoft.KubernetesConfiguration ``` 1. Clone/download this repo locally, or even better fork this repository. diff --git a/02-ca-certificates.md b/02-ca-certificates.md index 0a3ea019..17878969 100644 --- a/02-ca-certificates.md +++ b/02-ca-certificates.md @@ -12,7 +12,7 @@ Now that you have the [prerequisites](./01-prerequisites.md) met, follow the ste 1. Generate a client-facing self-signed TLS certificate - > :book: Contoso Bicycle needs to procure a CA certificate for the web site. As this is going to be a user-facing site, they purchase an EV cert from their CA. This will serve in front of the Azure Application Gateway. They will also procure another one, a standard cert, to be used with the AKS Ingress Controller. This one is not EV, as it will not be user facing. + > :book: Contoso Bicycle needs to procure a CA certificate for the web site. As this is going to be a user-facing site, they purchase an EV cert from their CA. This will serve in front of the Azure Application Gateway. They will also procure another one, a standard cert, to be used with the AKS Ingress Controller. This one is not EV, as it will not be user facing. :warning: Do not use the certificate created by this script for actual deployments. The use of self-signed certificates are provided for ease of illustration purposes only. For your cluster, use your organization's requirements for procurement and lifetime management of TLS certificates, _even for development purposes_. diff --git a/03-aad.md b/03-aad.md index 403712f5..098401ac 100644 --- a/03-aad.md +++ b/03-aad.md @@ -67,13 +67,13 @@ Following the steps below you will result in an Azure AD configuration that will ## Kubernetes RBAC backing store -AKS supports backing Kubernetes with Azure AD in two different modalities. One is direct association between Azure AD and Kubernetes `ClusterRoleBindings`/`RoleBindings` in the cluster. This is possible no matter if the Azure AD tenant you wish to use to back your Kubernetes RBAC is the same or different than the Tenant backing your Azure resources. If however the tenant that is backing your Azure resources (Azure RBAC source) is the same tenant you plan on using to back your Kubernetes RBAC, then instead you can add a layer of indirection between Azure AD and your cluster by using Azure RBAC instead of direct cluster `RoleBinding` manipulation. When performing this walk-through, you may have had no choice but to associate the cluster with another tenant (due to the elevated permissions necessary in Azure AD to manage groups and users); but when you take this to production be sure you're using Azure RBAC as your Kubernetes RBAC backing store if the tenants are the same. Both cases still leverage integrated authentication between Azure AD and AKS, Azure RBAC simply elevates this control to Azure RBAC instead of direct yaml-based management within the cluster which usually will align better with your organization's governance strategy. +AKS supports backing Kubernetes with Azure AD in two different modalities. One is direct association between Azure AD and Kubernetes `ClusterRoleBindings`/`RoleBindings` in the cluster. This is possible no matter if the Azure AD tenant you wish to use to back your Kubernetes RBAC is the same or different than the Tenant backing your Azure resources. If however the tenant that is backing your Azure resources (Azure RBAC source) is the same tenant you plan on using to back your Kubernetes RBAC, then instead you can add a layer of indirection between Azure AD and your cluster by using Azure RBAC instead of direct cluster `RoleBinding` manipulation. When performing this walk-through, you may have had no choice but to associate the cluster with another tenant (due to the elevated permissions necessary in Azure AD to manage groups and users); but when you take this to production be sure you're using Azure RBAC as your Kubernetes RBAC backing store if the tenants are the same. Both cases still leverage integrated authentication between Azure AD and AKS, Azure RBAC simply elevates this control to Azure RBAC instead of direct yaml-based management within the cluster which usually will align better with your organization's governance strategy. ### Azure RBAC _[Preferred]_ -If you are using a single tenant for this walk-through, the cluster deployment step later will take care of the necessary role assignments for the groups created above. Specifically, in the above steps, you created the Azure AD security group `cluster-ns-a0008-readers-bu0001a000800` that is going to be a namespace reader in namespace `a0008` and the Azure AD security group `cluster-admins-bu0001a000800` is is going to contain cluster admins. Those group Object IDs will be associated to the 'Azure Kubernetes Service RBAC Reader' and 'Azure Kubernetes Service RBAC Cluster Admin' RBAC role respectively, scoped to their proper level within the cluster. +If you are using a single tenant for this walk-through, the cluster deployment step later will take care of the necessary role assignments for the groups created above. Specifically, in the above steps, you created the Azure AD security group `cluster-ns-a0008-readers-bu0001a000800` that is going to be a namespace reader in namespace `a0008` and the Azure AD security group `cluster-admins-bu0001a000800` is is going to contain cluster admins. Those group Object IDs will be associated to the 'Azure Kubernetes Service RBAC Reader' and 'Azure Kubernetes Service RBAC Cluster Admin' RBAC role respectively, scoped to their proper level within the cluster. -Using Azure RBAC as your authorization approach is ultimately preferred as it allows for the unified management and access control across Azure Resources, AKS, and Kubernetes resources. At the time of this writing there are four [Azure RBAC roles](https://docs.microsoft.com/azure/aks/manage-azure-rbac#create-role-assignments-for-users-to-access-cluster) that represent typical cluster access patterns. +Using Azure RBAC as your authorization approach is ultimately preferred as it allows for the unified management and access control across Azure Resources, AKS, and Kubernetes resources. At the time of this writing there are four [Azure RBAC roles](https://docs.microsoft.com/azure/aks/manage-azure-rbac#create-role-assignments-for-users-to-access-cluster) that represent typical cluster access patterns. ### Direct Kubernetes RBAC management _[Alternative]_ diff --git a/04-networking.md b/04-networking.md index 8f73a0b3..ca75fbb7 100644 --- a/04-networking.md +++ b/04-networking.md @@ -104,4 +104,4 @@ The following two resource groups will be created and populated with networking ### Next step -:arrow_forward: [Deploy the AKS cluster](./05-aks-cluster.md) +:arrow_forward: [Prep for cluster bootstrapping](./05-bootstrap-prep.md) diff --git a/05-bootstrap-prep.md b/05-bootstrap-prep.md new file mode 100644 index 00000000..b229a692 --- /dev/null +++ b/05-bootstrap-prep.md @@ -0,0 +1,88 @@ +# Prep for cluster bootstrapping + +Now that the [hub-spoke network is provisioned](./04-networking.md), the next step in the [AKS secure Baseline reference implementation](./) is preparing what your AKS cluster should be bootstrapped with. + +## Expected results + +Container registries often have a lifecycle that extends beyond the scope of a single cluster. They can be scoped broadly at organizational or business unit levels, or can be scoped at workload levels, but usually are not directly tied to the lifecycle of any specific cluster instance. For example, you may do blue/green _cluster instance_ deployments, both using the same container registry. Even though clusters came and went, the registry stays intact. + +* Azure Container Registry (ACR) is deployed, and exposed as a private endpoint. +* ACR is populated with images your cluster will need as part of its bootstrapping process. +* Log Analytics is deployed and ACR platform logging is configured. This workspace will be used by your cluster as well. + +The role of this pre-existing ACR instance is made more prominant when we think about cluster bootstrapping. That is the process that happens after Azure Resource deployment of the cluster, but before your first workload lands in the cluster. The cluster will be bootstrapped immedately and automatically after resource deployment, which means you'll need ACR in place to act as your official OCI artifact repository for required images and Helm charts used in that bootstrapping process. + +### Method + +We'll be bootstrapping this cluster with the Flux GitOps agent as installed by the AKS extension. This specific choice does not imply that Flux or GitOps in general is the only approach to bootstrapping. Consider your organizational familiarity and acceptance of tooling like this and decide if cluster baseline management should be performed with GitOps or via your deployment pipelines. If you are running a fleet of clusters, a GitOps approach is highly recommended for uniformity and easier governance. When running only a few clusters, GitOps might be seen as "too much" and you might instead opt for integrating that process into one or more deployment pipelines to ensure bootstrapping takes place. No matter which way you go, you'll need your bootstrapping artifacts ready to go before you start your cluster deployment so that you can minimize the time between cluster deployment and bootstrapping. Using the Flux AKS Extension allows your cluster to start already bootstrapped and sets you up with a solid management foundation going forward. + +## Steps + +1. Create the AKS cluster resource group. + + > :book: The app team working on behalf of business unit 0001 (BU001) is looking to create an AKS cluster of the app they are creating (Application ID: 0008). They have worked with the organization's networking team and have been provisioned a spoke network in which to lay their cluster and network-aware external resources into (such as Application Gateway). They took that information and added it to their [`acr-stamp.json`](./acr-stamp.json), [`cluster-stamp.json`](./cluster-stamp.json), and [`azuredeploy.parameters.prod.json`](./azuredeploy.parameters.prod.json) files. + > + > They create this resource group to be the parent group for the application. + + ```bash + # [This takes less than one minute.] + az group create --name rg-bu0001a0008 --location eastus2 + ``` + +1. Get the AKS cluster spoke VNet resource ID. + + > :book: The app team will be deploying to a spoke VNet, that was already provisioned by the network team. + + ```bash + export RESOURCEID_VNET_CLUSTERSPOKE_AKS_BASELINE=$(az deployment group show -g rg-enterprise-networking-spokes -n spoke-BU0001A0008 --query properties.outputs.clusterVnetResourceId.value -o tsv) + ``` + +1. Deploy the container registry template. + + ```bash + # [This takes about three minutes.] + az deployment group create -g rg-bu0001a0008 -f acr-stamp.json -p targetVnetResourceId=${RESOURCEID_VNET_CLUSTERSPOKE_AKS_BASELINE} + ``` + +1. Import cluster management images to your container registry. + + > Public container registries are subject to faults such as outages or request throttling. Interruptions like these can be crippling for a system that needs to pull an image _right now_. To minimize the risks of using public registries, store all applicable container images in a registry that you control, such as the SLA-backed Azure Container Registry. + + ```bash + # Get your ACR instance name + export ACR_NAME_AKS_BASELINE=$(az deployment group show -g rg-bu0001a0008 -n acr-stamp --query properties.outputs.containerRegistryName.value -o tsv) + + # Import core image(s) hosted in public container registries to be used during bootstrapping + az acr import --source docker.io/weaveworks/kured:1.9.0 -n $ACR_NAME_AKS_BASELINE + ``` + + > In this walkthrough, there is only one image that is included in the bootstrapping process. It's included as an example/reference for this process. Your choice to use kured or any other images, including helm charts, as part of your bootstrapping is yours to make. + +1. Update bootstrapping manifests to pull from your ACR instance. _Optional. Fork required._ + + > Your cluster will immedately begin processing the manifests in [`cluster-manifests/`](./cluster-manifests/) due to the bootstrapping configuration that will be applied to it. So, before you deploy the cluster now would be the right time push the following changes to your fork so that it will use your files instead of the files found in the original mspnp repo which point to public container registries: + > + > * update the one `image:` value in [`kured.yaml`](./cluster-manifests/cluster-baseline-settings/kured.yaml) to use your container registry instead of a public container registry. See the comment in the file for instructions (or you can simply run the command below.) + + :warning: Without updating these files and using your own fork, you will be deploying your cluster such that it takes dependencies on public container registries. This is generally okay for exploratory/testing, but not suitable for production. Before going to production, ensure _all_ image references you bring to your cluster are from _your_ container registry (link imported in the prior step) or another that you feel confident relying on. + + ```bash + sed -i "s:docker.io:${ACR_NAME_AKS_BASELINE}.azurecr.io:" ./cluster-manifests/cluster-baseline-settings/kured.yaml + + git commit -a -m "Update image source to use my ACR instance instead of a public container registry." + git push + ``` + +### Save your work in-progress + +```bash +# run the saveenv.sh script at any time to save environment variables created above to aks_baseline.env +./saveenv.sh + +# if your terminal session gets reset, you can source the file to reload the environment variables +# source aks_baseline.env +``` + +### Next step + +:arrow_forward: [Deploy the AKS cluster](./06-aks-cluster.md) diff --git a/05-aks-cluster.md b/06-aks-cluster.md similarity index 79% rename from 05-aks-cluster.md rename to 06-aks-cluster.md index 0ae13647..b6df5817 100644 --- a/05-aks-cluster.md +++ b/06-aks-cluster.md @@ -1,36 +1,25 @@ # Deploy the AKS Cluster -Now that the [hub-spoke network is provisioned](./04-networking.md), the next step in the [AKS Baseline reference implementation](./) is deploying the AKS cluster and its adjacent Azure resources. +Now that the your [ACR instance is deployed and ready to support cluster bootstrapping](./05-bootstrap-prep.md), the next step in the [AKS Baseline reference implementation](./) is deploying the AKS cluster and its remaining adjacent Azure resources. ## Steps -1. Create the AKS cluster resource group. +1. Indicate your bootstrapping repo. - > :book: The app team working on behalf of business unit 0001 (BU001) is looking to create an AKS cluster of the app they are creating (Application ID: 0008). They have worked with the organization's networking team and have been provisioned a spoke network in which to lay their cluster and network-aware external resources into (such as Application Gateway). They took that information and added it to their [`cluster-stamp.json`](./cluster-stamp.json) and [`azuredeploy.parameters.prod.json`](./azuredeploy.parameters.prod.json) files. - > - > They create this resource group to be the parent group for the application. + > If you cloned this repo, then the value will be the original mspnp GitHub organization's repo, which will mean that your cluster will be bootstraped using public container images. If instead you forked this repo, then the GitOps repo will be your own repo, and your cluster will be bootstrapped using container images references based on the values in your repo's manifest files. On the prior instruction page you had the oppertunity to update those manifests to use your ACR instance. ```bash - # [This takes less than one minute.] - az group create --name rg-bu0001a0008 --location eastus2 + GITOPS_REPOURL=$(git config --get remote.origin.url) ``` -1. Get the AKS cluster spoke VNet resource ID. - - > :book: The app team will be deploying to a spoke VNet, that was already provisioned by the network team. - - ```bash - RESOURCEID_VNET_CLUSTERSPOKE=$(az deployment group show -g rg-enterprise-networking-spokes -n spoke-BU0001A0008 --query properties.outputs.clusterVnetResourceId.value -o tsv) - ``` - -1. Deploy the cluster ARM template. +1. Deploy the cluster ARM template. :exclamation: By default, this deployment will allow unrestricted access to your cluster's API Server. You can limit access to the API Server to a set of well-known IP addresses (i.,e. a jump box subnet (connected to by Azure Bastion), build agents, or any other networks you'll administer the cluster from) by setting the `clusterAuthorizedIPRanges` parameter in all deployment options. This setting will also impact traffic originating from within the cluster trying to use the API server, so you will also need to include _all_ of the public IPs used by your egress Azure Firewall. For more information, see [Secure access to the API server using authorized IP address ranges](https://docs.microsoft.com/azure/aks/api-server-authorized-ip-ranges#create-an-aks-cluster-with-api-server-authorized-ip-ranges-enabled). **Option 1 - Deploy from the command line** ```bash - # [This takes about 15 minutes.] - az deployment group create -g rg-bu0001a0008 -f cluster-stamp.json -p targetVnetResourceId=${RESOURCEID_VNET_CLUSTERSPOKE} clusterAdminAadGroupObjectId=${AADOBJECTID_GROUP_CLUSTERADMIN_AKS_BASELINE} a0008NamespaceReaderAadGroupObjectId=${AADOBJECTID_GROUP_A0008_READER_AKS_BASELINE} k8sControlPlaneAuthorizationTenantId=${TENANTID_K8SRBAC_AKS_BASELINE} appGatewayListenerCertificate=${APP_GATEWAY_LISTENER_CERTIFICATE_AKS_BASELINE} aksIngressControllerCertificate=${AKS_INGRESS_CONTROLLER_CERTIFICATE_BASE64_AKS_BASELINE} domainName=${DOMAIN_NAME_AKS_BASELINE} + # [This takes about 18 minutes.] + az deployment group create -g rg-bu0001a0008 -f cluster-stamp.json -p targetVnetResourceId=${RESOURCEID_VNET_CLUSTERSPOKE_AKS_BASELINE} clusterAdminAadGroupObjectId=${AADOBJECTID_GROUP_CLUSTERADMIN_AKS_BASELINE} a0008NamespaceReaderAadGroupObjectId=${AADOBJECTID_GROUP_A0008_READER_AKS_BASELINE} k8sControlPlaneAuthorizationTenantId=${TENANTID_K8SRBAC_AKS_BASELINE} appGatewayListenerCertificate=${APP_GATEWAY_LISTENER_CERTIFICATE_AKS_BASELINE} aksIngressControllerCertificate=${AKS_INGRESS_CONTROLLER_CERTIFICATE_BASE64_AKS_BASELINE} domainName=${DOMAIN_NAME_AKS_BASELINE} gitOpsBootstrappingRepoHttpsUrl=${GITOPS_REPOURL} ``` > Alteratively, you could have updated the [`azuredeploy.parameters.prod.json`](./azuredeploy.parameters.prod.json) file and deployed as above, using `-p "@azuredeploy.parameters.prod.json"` instead of providing the individual key-value pairs. @@ -94,11 +83,12 @@ Now that the [hub-spoke network is provisioned](./04-networking.md), the next st sed "s##eastus2#g" | \ sed "s##rg-bu0001a0008#g" | \ sed "s##centralus#g" | \ - sed "s##${RESOURCEID_VNET_CLUSTERSPOKE}#g" | \ + sed "s##${RESOURCEID_VNET_CLUSTERSPOKE_AKS_BASELINE}#g" | \ sed "s##${TENANTID_K8SRBAC_AKS_BASELINE}#g" | \ sed "s##${AADOBJECTID_GROUP_CLUSTERADMIN_AKS_BASELINE}#g" | \ sed "s##${AADOBJECTID_GROUP_A0008_READER_AKS_BASELINE}#g" | \ sed "s##${DOMAIN_NAME_AKS_BASELINE}#g" \ + sed "s##${GITOPS_REPOURL}#g" \ > .github/workflows/aks-deploy.yaml ``` @@ -115,17 +105,12 @@ Now that the [hub-spoke network is provisioned](./04-networking.md), the next st 1. Navigate to your GitHub forked repository and open a PR against `main` using the recently pushed changes to the remote branch `kick-off-workflow`. - > :book: The DevOps team configured the GitHub Workflow to preview the changes that will happen when a PR is opened. This will allow them to evaluate the changes before they get deployed. After the PR reviewers see how resources will change if the AKS cluster ARM template gets deployed, it is possible to merge or discard the pull request. If the decision is made to merge, it will trigger a push event that will kick off the actual deployment process that consists of: - > - > * AKS cluster creation - > * Flux deployment + > :book: The DevOps team configured the GitHub Workflow to preview the changes that will happen when a PR is opened. This will allow them to evaluate the changes before they get deployed. After the PR reviewers see how resources will change if the AKS cluster ARM template gets deployed, it is possible to merge or discard the pull request. If the decision is made to merge, it will trigger a push event that will kick off the actual deployment process. 1. Once the GitHub Workflow validation finished successfully, please proceed by merging this PR into `main`. > :book: The DevOps team monitors this Workflow execution instance. In this instance it will impact a critical piece of infrastructure as well as the management. This flow works for both new or an existing AKS cluster. - 1. :fast_forward: The cluster is placed under GitOps managed as part of these GitHub Workflow steps. Therefore, you should proceed straight to [Workflow Prerequisites](./07-workload-prerequisites.md). - ## Container registry note :warning: To aid in ease of deployment of this cluster and your experimentation with workloads, Azure Policy and Azure Firewall are currently configured to allow your cluster to pull images from _public container registries_ such as Docker Hub. For a production system, you'll want to update Azure Policy parameter named `allowedContainerImagesRegex` in your `cluster-stamp.json` file to only list those container registries that you are willing to take a dependency on and what namespaces those policies apply to, and make Azure Firewall allowances for the same. This will protect your cluster from unapproved registries being used, which may prevent issues while trying to pull images from a registry which doesn't provide SLA guarantees for your deployment. @@ -138,4 +123,4 @@ Azure Application Gateway, for this reference implementation, is placed in the s ### Next step -:arrow_forward: [Place the cluster under GitOps management](./06-gitops.md) +:arrow_forward: [Validate your cluster is bootstrapped](./07-bootstrap-validation.md) diff --git a/06-gitops.md b/06-gitops.md deleted file mode 100644 index bc3d5dd8..00000000 --- a/06-gitops.md +++ /dev/null @@ -1,107 +0,0 @@ -# Place the Cluster Under GitOps Management - -Now that [the AKS cluster](./05-aks-cluster.md) has been deployed, the next step to configure a GitOps management solution on our cluster, Flux in this case. - -## Steps - -GitOps allows a team to author Kubernetes manifest files, persist them in their git repo, and have them automatically apply to their cluster as changes occur. This reference implementation is focused on the baseline cluster, so Flux is managing cluster-level concerns. This is distinct from workload-level concerns, which would be possible as well to manage via Flux, and would typically be done by additional Flux operators in the cluster. The namespace `cluster-baseline-settings` will be used to provide a logical division of the cluster bootstrap configuration from workload configuration. Examples of manifests that are applied: - -* Cluster Role Bindings for the AKS-managed Azure AD integration -* AAD Pod Identity -* CSI driver and Azure KeyVault CSI Provider -* the workload's namespace named `a0008` - -1. Install `kubectl` 1.22 or newer. (`kubctl` supports +/-1 Kubernetes version.) - - ```bash - sudo az aks install-cli - kubectl version --client - ``` - -1. Get the cluster name. - - ```bash - AKS_CLUSTER_NAME=$(az aks list -g rg-bu0001a0008 --query '[0].name' -o tsv) - ``` - -1. Get AKS `kubectl` credentials. - - > In the [Azure Active Directory Integration](03-aad.md) step, we placed our cluster under AAD group-backed RBAC. This is the first time we are seeing this used. `az aks get-credentials` sets your `kubectl` context so that you can issue commands against your cluster. Even when you have enabled Azure AD integration with your AKS cluster, an Azure user has sufficient permissions on the cluster resource can still access your AKS cluster by using the `--admin` switch to this command. Using this switch _bypasses_ Azure AD and uses client certificate authentication instead; that isn't what we want to happen. So in order to prevent that practice, local account access (e.g. `clusterAdmin` or `clusterMonitoringUser`) is expressly disabled. - > - > In a following step, you'll log in with a user that has been added to the Azure AD security group used to back the Kubernetes RBAC admin role. Executing the first `kubectl` command below will invoke the AAD login process to authorize the _user of your choice_, which will then be authenticated against Kubernetes RBAC to perform the action. The user you choose to log in with _must be a member of the AAD group bound_ to the `cluster-admin` ClusterRole. For simplicity you could either use the "break-glass" admin user created in [Azure Active Directory Integration](03-aad.md) (`bu0001a0008-admin`) or any user you assigned to the `cluster-admin` group assignment in your [`cluster-rbac.yaml`](cluster-manifests/cluster-rbac.yaml) file. - - ```bash - az aks get-credentials -g rg-bu0001a0008 -n $AKS_CLUSTER_NAME - ``` - - :warning: At this point two important steps are happening: - - * The `az aks get-credentials` command will be fetch a `kubeconfig` containing references to the AKS cluster you have created earlier. - * To _actually_ use the cluster you will need to authenticate. For that, run any `kubectl` commands which at this stage will prompt you to authenticate against Azure Active Directory. For example, run the following command: - - ```bash - kubectl get nodes - ``` - - Once the authentication happens successfully, some new items will be added to your `kubeconfig` file such as an `access-token` with an expiration period. For more information on how this process works in Kubernetes please refer to [the related documentation](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens). - -1. Import cluster management images to your container registry. - - > Public container registries are subject to faults such as outages (no SLA) or request throttling. Interruptions like these can be crippling for a system that needs to pull an image _right now_. To minimize the risks of using public registries, store all applicable container images in a registry that you control, such as the SLA-backed Azure Container Registry. - - ```bash - # Get your ACR cluster name - export ACR_NAME_AKS_BASELINE=$(az deployment group show -g rg-bu0001a0008 -n cluster-stamp --query properties.outputs.containerRegistryName.value -o tsv) - - # Import cluster management images hosted in public container registries - az acr import --source docker.io/library/memcached:1.5.20 -n $ACR_NAME_AKS_BASELINE - az acr import --source docker.io/fluxcd/flux:1.21.1 -n $ACR_NAME_AKS_BASELINE - az acr import --source docker.io/weaveworks/kured:1.7.0 -n $ACR_NAME_AKS_BASELINE - ``` - -1. Create the cluster baseline settings namespace. - - ```bash - # Verify the user you logged in with has the appropriate permissions. This should result in a - # "yes" response. If you receive "no" to this command, check which user you authenticated as - # and ensure it is assigned to the Azure AD Group you designated for cluster admins. - kubectl auth can-i create namespace -A - - kubectl create namespace cluster-baseline-settings - ``` - -1. Deploy Flux. - - > If you used your own fork of this GitHub repo, update the [`flux.yaml`](./cluster-manifests/cluster-baseline-settings/flux.yaml) file to **reference your own repo and change the URL below** to point to yours as well. Also, since Flux will begin processing the manifests in [`cluster-manifests/`](./cluster-manifests/) now would be the right time push the following changes to your fork: - > - > * Update three `image` references to use your container registry instead of public container registries. See the comment in each file for instructions. - > * update the two `image:` values in [`flux.yaml`](./cluster-manifests/cluster-baseline-settings/flux.yaml). - > * update the one `image:` values in [`kured.yaml`](./cluster-manifests/cluster-baseline-settings/kured.yaml). - - :warning: Deploying the flux configuration using the `flux.yaml` file unmodified from this repo will be deploying your cluster to take dependencies on public container registries. This is generally okay for exploratory/testing, but not suitable for production. Before going to production, ensure _all_ image references you bring to your cluster are from _your_ container registry (as imported in the prior step) or another that you feel confident relying on. - - ```bash - kubectl create -f https://raw.githubusercontent.com/mspnp/aks-secure-baseline/main/cluster-manifests/cluster-baseline-settings/flux.yaml - ``` - -1. Wait for Flux to be ready before proceeding. - - ```bash - kubectl wait -n cluster-baseline-settings --for=condition=ready pod --selector=app.kubernetes.io/name=flux --timeout=90s - ``` - -Generally speaking, this will be the last time you should need to use `kubectl` for day-to-day configuration operations on this cluster (outside of break-fix situations). Between ARM for Azure Resource definitions and the application of manifests via Flux, all normal configuration activities can be performed without the need to use `kubectl`. You will however see us use it for the upcoming workload deployment. This is because the SDLC component of workloads are not in scope for this reference implementation, as this is focused the infrastructure and baseline configuration. - -### Save your work in-progress - -```bash -# run the saveenv.sh script at any time to save environment variables created above to aks_baseline.env -./saveenv.sh - -# if your terminal session gets reset, you can source the file to reload the environment variables -# source aks_baseline.env -``` - -### Next step - -:arrow_forward: [Prepare for the workload by installing its prerequisites](./07-workload-prerequisites.md) diff --git a/07-boostrap-validation.md b/07-boostrap-validation.md new file mode 100644 index 00000000..fd3930de --- /dev/null +++ b/07-boostrap-validation.md @@ -0,0 +1,76 @@ +# Validate your cluster is bootstrapped and enrolled in GitOps + +Now that [the AKS cluster](./06-aks-cluster.md) has been deployed, the next step to validate that your cluster has been placed under a GitOps management solution, Flux in this case. + +## Steps + +GitOps allows a team to author Kubernetes manifest files, persist them in their git repo, and have them automatically apply to their cluster as changes occur. This reference implementation is focused on the baseline cluster, so Flux is managing cluster-level concerns. This is distinct from workload-level concerns, which would be possible as well to manage via Flux, and would typically be done by additional Flux configuration in the cluster. The namespace `cluster-baseline-settings` will be used to provide a logical division of the cluster bootstrap configuration from workload configuration. Examples of manifests that are applied: + +* Cluster Role Bindings for the AKS-managed Azure AD integration +* AAD Pod Identity +* the workload's namespace named `a0008` + +1. Install `kubectl` 1.23 or newer. (`kubctl` supports +/-1 Kubernetes version.) + + ```bash + sudo az aks install-cli + kubectl version --client + ``` + +1. Get the cluster name. + + ```bash + AKS_CLUSTER_NAME=$(az aks list -g rg-bu0001a0008 --query '[0].name' -o tsv) + ``` + +1. Get AKS `kubectl` credentials. + + > In the [Azure Active Directory Integration](03-aad.md) step, we placed our cluster under AAD group-backed RBAC. This is the first time we are seeing this used. `az aks get-credentials` sets your `kubectl` context so that you can issue commands against your cluster. Even when you have enabled Azure AD integration with your AKS cluster, an Azure user has sufficient permissions on the cluster resource can still access your AKS cluster by using the `--admin` switch to this command. Using this switch _bypasses_ Azure AD and uses client certificate authentication instead; that isn't what we want to happen. So in order to prevent that practice, local account access (e.g. `clusterAdmin` or `clusterMonitoringUser`) is expressly disabled. + > + > In a following step, you'll log in with a user that has been added to the Azure AD security group used to back the Kubernetes RBAC admin role. Executing the first `kubectl` command below will invoke the AAD login process to authorize the _user of your choice_, which will then be authenticated against Kubernetes RBAC to perform the action. The user you choose to log in with _must be a member of the AAD group bound_ to the `cluster-admin` ClusterRole. For simplicity you could either use the "break-glass" admin user created in [Azure Active Directory Integration](03-aad.md) (`bu0001a0008-admin`) or any user you assigned to the `cluster-admin` group assignment in your [`cluster-rbac.yaml`](cluster-manifests/cluster-rbac.yaml) file. + + ```bash + az aks get-credentials -g rg-bu0001a0008 -n $AKS_CLUSTER_NAME + ``` + + :warning: At this point two important steps are happening: + + * The `az aks get-credentials` command will be fetch a `kubeconfig` containing references to the AKS cluster you have created earlier. + * To _actually_ use the cluster you will need to authenticate. For that, run any `kubectl` commands which at this stage will prompt you to authenticate against Azure Active Directory. For example, run the following command: + + ```bash + kubectl get nodes + ``` + + Once the authentication happens successfully, some new items will be added to your `kubeconfig` file such as an `access-token` with an expiration period. For more information on how this process works in Kubernetes please refer to [the related documentation](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens). + +1. Validate your cluster is bootstrapped. + + The bootstrapping process that already happened due to the usage of the Flux extension for AKS has set up the following, amoung other things + + * AAD Pod Identity + * the workload's namespace named `a0008` + * Installed kured + + ```bash + kubectl get namespaces + kubectl get all -n cluster-baseline-settings + ``` + + These commands will show you results that were due to the automatic bootstrapping process your cluster experienced due to the Flux GitOps extension. This content mirrors the content found in [`cluster-manifests`](./cluster-manifests), and commits made there will reflect in your cluster within minutes of making the change. + +The end result of all of this is that `kubectl` was not required for any part of the bootstrapping process of a cluster. The usage of `kubectl`-based access should be reserved for emergency break-fix situations and not for day-to-day configuration operations on this cluster. Between templates for Azure Resource definitions, and the bootstrapping of manifests via the GitOps extension, all normal configuration activities can be performed without the need to use `kubectl`. You will however see us use it for the upcoming workload deployment. This is because the SDLC component of workloads are not in scope for this reference implementation, as this is focused the infrastructure and baseline configuration. + +## Alternatives + +Using the AKS extension for Flux gives you a seemless bootstrapping process that applies immediately after the cluster resource is created in Azure. It also supports the inclusion of that bootstrapping as resource templates to align with your IaC strategy. Alterantively you could apply bootstrapping as a secondary step after the cluster is deployed and manage that process external to the lifecycle of the cluster. Doing so will open your cluster up to a prolonged window between the cluster being deployed and your bootstrapping being applied. + +Furthermore, Flux doesn't need to be installed as an extension and instead the GitOps operator of your choice (such as ArgoCD) could be installed as part of your external bootstrapping process. + +## Recommendations + +It is recommended to have a clearly defined bootstrapping process that occurs as close as practiable to the actual cluster deployment for immediate enrollment of your cluster into your internal processes and tooling. GitOps lends itself well to this desired outcome, and you're encouraged to explore its usage for your cluster bootstrapping process and optionally also workload-level concerns. GitOps is often positioned best for fleet (many clusters) management for uniformity and its simplicity at scale; a more manual (via deployment pipelines) bootstrapping is common on small instance-count AKS deployments. Either process can work with either cluster topologies. Use a bootstrapping process that aligns with your desired objectives and constraints found within your organization and team. + +### Next step + +:arrow_forward: [Prepare for the workload by installing its prerequisites](./08-workload-prerequisites.md) diff --git a/07-workload-prerequisites.md b/08-workload-prerequisites.md similarity index 91% rename from 07-workload-prerequisites.md rename to 08-workload-prerequisites.md index 7fe03587..1ec07818 100644 --- a/07-workload-prerequisites.md +++ b/08-workload-prerequisites.md @@ -1,6 +1,6 @@ # Workload Prerequisites -The AKS Cluster has been enrolled in [GitOps management](./06-gitops.md), wrapping up the infrastructure focus of the [AKS Baseline reference implementation](./). Follow the steps below to import the TLS certificate that the Ingress Controller will serve for Application Gateway to connect to your web app. +The AKS Cluster has been [bootstrapped](./07-bootstrap-validation.md), wrapping up the infrastructure focus of the [AKS Baseline reference implementation](./). Follow the steps below to import the TLS certificate that the Ingress Controller will serve for Application Gateway to connect to your web app. ## Steps @@ -16,8 +16,8 @@ The AKS Cluster has been enrolled in [GitOps management](./06-gitops.md), wrappi export KEYVAULT_NAME_AKS_BASELINE=$(az deployment group show --resource-group rg-bu0001a0008 -n cluster-stamp --query properties.outputs.keyVaultName.value -o tsv) TEMP_ROLEASSIGNMENT_TO_UPLOAD_CERT=$(az role assignment create --role a4417e6f-fecd-4de8-b567-7b0420556985 --assignee-principal-type user --assignee-object-id $(az ad signed-in-user show --query 'objectId' -o tsv) --scope $(az keyvault show --name $KEYVAULT_NAME_AKS_BASELINE --query 'id' -o tsv) --query 'id' -o tsv) - # If you are behind a proxy or some other egress that does not give a static IP, you'll need to manually adjust the Azure Key Vault firewall to - # allow this traffic. + # If you are behind a proxy or some other egress that does not provide a consistent IP, you'll need to manually adjust the + # Azure Key Vault firewall to allow this traffic. CURRENT_IP_ADDRESS=$(curl -s https://ifconfig.io) az keyvault network-rule add -n $KEYVAULT_NAME_AKS_BASELINE --ip-address ${CURRENT_IP_ADDRESS} ``` @@ -75,4 +75,4 @@ The AKS Cluster has been enrolled in [GitOps management](./06-gitops.md), wrappi ### Next step -:arrow_forward: [Configure AKS Ingress Controller with Azure Key Vault integration](./08-secret-management-and-ingress-controller.md) +:arrow_forward: [Configure AKS Ingress Controller with Azure Key Vault integration](./09-secret-management-and-ingress-controller.md) diff --git a/08-secret-management-and-ingress-controller.md b/09-secret-management-and-ingress-controller.md similarity index 96% rename from 08-secret-management-and-ingress-controller.md rename to 09-secret-management-and-ingress-controller.md index 36cb5aed..d94bff5d 100644 --- a/08-secret-management-and-ingress-controller.md +++ b/09-secret-management-and-ingress-controller.md @@ -1,6 +1,6 @@ # Configure AKS Ingress Controller with Azure Key Vault integration -Previously you have configured [workload prerequisites](./07-workload-prerequisites.md). These steps configure Traefik, the AKS ingress solution used in this reference implementation, so that it can securely expose the web app to your Application Gateway. +Previously you have configured [workload prerequisites](./08-workload-prerequisites.md). These steps configure Traefik, the AKS ingress solution used in this reference implementation, so that it can securely expose the web app to your Application Gateway. ## Steps @@ -11,7 +11,7 @@ Previously you have configured [workload prerequisites](./07-workload-prerequisi TRAEFIK_USER_ASSIGNED_IDENTITY_CLIENT_ID=$(az deployment group show --resource-group rg-bu0001a0008 -n cluster-stamp --query properties.outputs.aksIngressControllerPodManagedIdentityClientId.value -o tsv) ``` -1. Ensure Flux has created the following namespace. +1. Ensure your bootstrapping process has created the following namespace. ```bash # press Ctrl-C once you receive a successful response @@ -108,4 +108,4 @@ Previously you have configured [workload prerequisites](./07-workload-prerequisi ### Next step -:arrow_forward: [Deploy the Workload](./09-workload.md) +:arrow_forward: [Deploy the Workload](./10-workload.md) diff --git a/09-workload.md b/10-workload.md similarity index 87% rename from 09-workload.md rename to 10-workload.md index 72058aa5..089ec8ac 100644 --- a/09-workload.md +++ b/10-workload.md @@ -41,7 +41,7 @@ The cluster now has an [Traefik configured with a TLS certificate](./08-secret-m > You should expect a `403` HTTP response from your ingress controller if you attempt to connect to it _without_ going through the App Gateway. Likewise, if any workload other than the ingress controller attempts to reach the workload, the traffic will be denied via network policies. ```bash - kubectl run curl -n a0008 -i --tty --rm --image=mcr.microsoft.com/azure-cli --limits='cpu=200m,memory=128Mi' + kubectl run curl -n a0008 -i --tty --rm --image=mcr.microsoft.com/azure-cli --overrides='[{"op":"add","path":"/spec/containers/0/resources","value":{"limits":{"cpu":"200m","memory":"128Mi"}}}]' --override-type json # From within the open shell now running on a container inside your cluster DOMAIN_NAME="contoso.com" # <-- Change to your custom domain value if a different one was used @@ -49,14 +49,14 @@ The cluster now has an [Traefik configured with a TLS certificate](./08-secret-m exit ``` - > :beetle: You might receive a message about `--limits` being deprecated, you can [safely ignore that message](https://github.com/kubernetes/kubectl/issues/1101) if you are using kubectl 1.22. However, if you are running kubectl 1.23+, you will need to use the following command instead. + > :beetle: If you are running a version of kubectl less than 1.23, you'll receive an error from the run command above as the method to provide container limits has changed between 1.22 and 1.23. On kubectl versions less than 1.23, you'll need to run the following command instead and you can [safely ignore the message about `--limits` being deprecated](https://github.com/kubernetes/kubectl/issues/1101). ```bash - kubectl run curl -n a0008 -i --tty --rm --image=mcr.microsoft.com/azure-cli --overrides='[{"op":"add","path":"/spec/containers/0/resources","value":{"limits":{"cpu":"200m","memory":"128Mi"}}}]' --override-type json + kubectl run curl -n a0008 -i --tty --rm --image=mcr.microsoft.com/azure-cli --limits='cpu=200m,memory=128Mi' ``` - > From this container shell, you could also try to directly access the workload via `curl -I http://`. Instead of getting back a `200 OK`, you'll receive a network timeout because of the [`allow-only-ingress-to-workload` network policy](./cluster-manifests/a0008/ingress-network-policy.yaml) that is in place. + > From this container shell, you could also try to directly access the workload via `curl -I http://`. Instead of getting back a `200 OK`, you'll receive a network timeout because of the [`allow-only-ingress-to-workload` network policy](./cluster-manifests/a0008/ingress-network-policy.yaml) that is in place. ### Next step -:arrow_forward: [End-to-End Validation](./10-validation.md) +:arrow_forward: [End-to-End Validation](./11-validation.md) diff --git a/10-validation.md b/11-validation.md similarity index 95% rename from 10-validation.md rename to 11-validation.md index e6c7c2f1..d12046ab 100644 --- a/10-validation.md +++ b/11-validation.md @@ -1,6 +1,6 @@ # End-to-End Validation -Now that you have a workload deployed, the [ASP.NET Core Docker sample web app](./09-workload.md), you can start validating and exploring this reference implementation of the [AKS Baseline cluster](./). In addition to the workload, there are some observability validation you can perform as well. +Now that you have a workload deployed, the [ASP.NET Core Docker sample web app](./10-workload.md), you can start validating and exploring this reference implementation of the [AKS Baseline cluster](./). In addition to the workload, there are some observability validation you can perform as well. ## Validate the Web App @@ -74,7 +74,7 @@ You can also execute [queries](https://docs.microsoft.com/azure/azure-monitor/lo Azure Monitor is configured to [scrape Prometheus metrics](https://docs.microsoft.com/azure/azure-monitor/insights/container-insights-prometheus-integration) in your cluster. This reference implementation is configured to collect Prometheus metrics from two namespaces, as configured in [`container-azm-ms-agentconfig.yaml`](./cluster-baseline-settings/container-azm-ms-agentconfig.yaml). There are two pods configured to emit Prometheus metrics: - [Traefik](./workload/traefik.yaml) (in the `a0008` namespace) -- [Kured](./cluster-baseline-settings/kured-1.4.0-dockerhub.yaml) (in the `cluster-baseline-settings` namespace) +- [Kured](./cluster-baseline-settings/kured.yaml) (in the `cluster-baseline-settings` namespace) ### Steps @@ -150,8 +150,8 @@ If you configured your third-party images to be pulled from your Azure Container | where OperationName == 'Pull' ``` -1. You should see logs for CSI, flux, kured, memcached, and traefik. You'll see multiple for some as the image was pulled to multiple nodes to satisfy ReplicaSet/DaemonSet placement. +1. You should see logs for CSI, kured, memcached, and traefik. You'll see multiple for some as the image was pulled to multiple nodes to satisfy ReplicaSet/DaemonSet placement. ## Next step -:arrow_forward: [Clean Up Azure Resources](./11-cleanup.md) +:arrow_forward: [Clean Up Azure Resources](./12-cleanup.md) diff --git a/11-cleanup.md b/12-cleanup.md similarity index 100% rename from 11-cleanup.md rename to 12-cleanup.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 3dec230b..1e0f1ee1 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,6 +1,6 @@ # Contributing to the AKS Baseline reference implementation -This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit . +This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit . When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. @@ -33,7 +33,7 @@ You can *request* a new feature by [submitting an issue](#submit-issue) to the G Before you submit an issue, search the archive, maybe your question was already answered. -If your issue appears to be a bug, and hasn't been reported, open a new issue. Help us to maximize the effort we can spend fixing issues and adding new features, by not reporting duplicate issues. Providing the following information will increase the chances of your issue being dealt with quickly: +If your issue appears to be a bug, and hasn't been reported, open a new issue. Help us to maximize the effort we can spend fixing issues and adding new features, by not reporting duplicate issues. Providing the following information will increase the chances of your issue being dealt with quickly: * **Overview of the Issue** - if an error is being thrown a non-minified stack trace helps * **Version** - what version is affected (e.g. 0.1.2) diff --git a/README.md b/README.md index edccd044..17b02f37 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ We walk through the deployment here in a rather _verbose_ method to help you und ## Azure Architecture Center guidance -This project has a companion set of articles that describe challenges, design patterns, and best practices for a secure AKS cluster. You can find this article on the Azure Architecture Center at [Azure Kubernetes Service (AKS) Baseline Cluster](https://aka.ms/architecture/aks-baseline). If you haven't reviewed it, we suggest you read it as it will give added context to the considerations applied in this implementation. Ultimately, this is the direct implementation of that specific architectural guidance. +This project has a companion set of articles that describe challenges, design patterns, and best practices for a secure AKS cluster. You can find this article on the Azure Architecture Center at [Azure Kubernetes Service (AKS) Baseline cluster](https://aka.ms/architecture/aks-baseline). If you haven't reviewed it, we suggest you read it as it will give added context to the considerations applied in this implementation. Ultimately, this is the direct implementation of that specific architectural guidance. ## Architecture @@ -68,10 +68,11 @@ Microsoft recommends AKS be deploy into a carefully planned network; sized appro ### 3. Deploying the cluster -This is the heart of the guidance in this reference implementation; paired with prior network topology guidance. Here you will deploy the Azure resources for your cluster and the adjacent services such as Azure Application Gateway WAF, Azure Monitor, Azure Container Registry, and Azure Key Vault. This is also where you put the cluster under GitOps orchestration. +This is the heart of the guidance in this reference implementation; paired with prior network topology guidance. Here you will deploy the Azure resources for your cluster and the adjacent services such as Azure Application Gateway WAF, Azure Monitor, Azure Container Registry, and Azure Key Vault. This is also where you will validate the cluster is bootstrapped. -- [ ] [Deploy the AKS cluster and supporting services](./05-aks-cluster.md) -- [ ] [Place the cluster under GitOps management](./06-gitops.md) +- [ ] [Prep for cluster bootstrapping](./05-boostrap-prep.md) +- [ ] [Deploy the AKS cluster and supporting services](./06-aks-cluster.md) +- [ ] [Validate cluster bootsrapping](./07-bootstrap-validation.md) We perform the prior steps manually here for you to understand the involved components, but we advocate for an automated DevOps process. Therefore, incorporate the prior steps into your CI/CD pipeline, as you would any infrastructure as code (IaC). We have included [a starter GitHub workflow](./github-workflow/aks-deploy.yaml) that demonstrates this. @@ -79,25 +80,21 @@ We perform the prior steps manually here for you to understand the involved comp Without a workload deployed to the cluster it will be hard to see how these decisions come together to work as a reliable application platform for your business. The deployment of this workload would typically follow a CI/CD pattern and may involve even more advanced deployment strategies (blue/green, etc). The following steps represent a manual deployment, suitable for illustration purposes of this infrastructure. -- [ ] Just like the cluster, there are [workload prerequisites to address](./07-workload-prerequisites.md) -- [ ] [Configure AKS Ingress Controller with Azure Key Vault integration](./08-secret-management-and-ingress-controller.md) -- [ ] [Deploy the workload](./09-workload.md) +- [ ] Just like the cluster, there are [workload prerequisites to address](./08-workload-prerequisites.md) +- [ ] [Configure AKS Ingress Controller with Azure Key Vault integration](./09-secret-management-and-ingress-controller.md) +- [ ] [Deploy the workload](./10-workload.md) ### 5. :checkered_flag: Validation Now that the cluster and the sample workload is deployed; it's time to look at how the cluster is functioning. -- [ ] [Perform end-to-end deployment validation](./10-validation.md) +- [ ] [Perform end-to-end deployment validation](./11-validation.md) ## :broom: Clean up resources Most of the Azure resources deployed in the prior steps will incur ongoing charges unless removed. -- [ ] [Cleanup all resources](./11-cleanup.md) - -## Inner-loop development scripts - -We have provided some sample deployment scripts that you could adapt for your own purposes while doing a POC/spike on this. Those scripts are found in the [inner-loop-scripts directory](./inner-loop-scripts). They include some additional considerations and may include some additional narrative as well. Consider checking them out. They consolidate most of the walk-through performed above into combined execution steps. +- [ ] [Cleanup all resources](./12-cleanup.md) ## Preview features @@ -108,9 +105,7 @@ Consider trying out and providing feedback on the following: - [Automatic Node Upgrade](https://github.com/Azure/AKS/issues/1486) - [Host-based encryption](https://docs.microsoft.com/azure/aks/enable-host-encryption) - Leverages added data encryption on your VMs' temp and OS disks. - [Generation 2 VM support](https://docs.microsoft.com/azure/aks/cluster-configuration#generation-2-virtual-machines-preview) - Increased memory options, Intel SGX support, and UEFI-based boot architectures. -- [Auto Upgrade Profile support](https://github.com/Azure/AKS/issues/1303) - [Customizable Node & Kublet config](https://github.com/Azure/AKS/issues/323) -- [GitOps as an add-on](https://github.com/Azure/AKS/issues/1967) - [Azure AD Pod Identity as an add-on](https://docs.microsoft.com/azure/aks/use-azure-ad-pod-identity) ## Related Reference Implementations @@ -135,7 +130,6 @@ This reference implementation intentionally does not cover more advanced scenari - Windows node pools - Scale-to-zero node pools and event-based scaling (KEDA) - [Terraform](https://docs.microsoft.com/azure/developer/terraform/create-k8s-cluster-with-tf-and-aks) -- [Bedrock](https://github.com/microsoft/bedrock) - [dapr](https://github.com/dapr/dapr) Keep watching this space, as we build out reference implementation guidance on topics such as these. Further guidance delivered will use this baseline AKS implementation as their starting point. If you would like to contribute or suggest a pattern built on this baseline, [please get in touch](./CONTRIBUTING.md). diff --git a/acr-stamp.json b/acr-stamp.json new file mode 100644 index 00000000..1455b838 --- /dev/null +++ b/acr-stamp.json @@ -0,0 +1,249 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "0.0.0.1", + "parameters": { + "targetVnetResourceId": { + "type": "string", + "minLength": 79, + "metadata": { + "description": "The regional network spoke VNet Resource ID that the cluster will be joined to." + } + }, + "location": { + "defaultValue": "eastus2", + "type": "string", + "allowedValues": [ + "australiaeast", + "canadacentral", + "centralus", + "eastus", + "eastus2", + "westus2", + "francecentral", + "germanywestcentral", + "northeurope", + "southafricanorth", + "southcentralus", + "uksouth", + "westeurope", + "japaneast", + "southeastasia" + ], + "metadata": { + "description": "AKS Service, Node Pool, and supporting services (KeyVault, App Gateway, etc) region. This needs to be the same region as the vnet provided in these parameters." + } + }, + "geoRedundancyLocation": { + "defaultValue": "centralus", + "type": "string", + "allowedValues": [ + "australiasoutheast", + "canadaeast", + "eastus2", + "westus", + "centralus", + "westcentralus", + "francesouth", + "germanynorth", + "westeurope", + "ukwest", + "northeurope", + "japanwest", + "southafricawest", + "northcentralus", + "eastasia", + "eastus", + "westus2", + "francecentral", + "uksouth", + "japaneast", + "southeastasia" + ], + "metadata": { + "description": "For Azure resources that support native geo-redunancy, provide the location the redundant service will have its secondary. Should be different than the location parameter and ideally should be a paired region - https://docs.microsoft.com/azure/best-practices-availability-paired-regions. This region does not need to support availability zones." + } + } + }, + "variables": { + "subRgUniqueString": "[uniqueString('aks', subscription().subscriptionId, resourceGroup().id)]", + + "clusterName": "[concat('aks-', variables('subRgUniqueString'))]", + "logAnalyticsWorkspaceName": "[concat('la-', variables('clusterName'))]", + "defaultAcrName": "[concat('acraks', variables('subRgUniqueString'))]", + + "vnetName": "[split(parameters('targetVnetResourceId'),'/')[8]]", + "vnetAcrPrivateEndpointSubnetResourceId": "[concat(parameters('targetVnetResourceId'), '/subnets/snet-clusternodes')]", + "acrPrivateDnsZonesName": "privatelink.azurecr.io" + }, + "resources": [ + { + "type": "Microsoft.OperationalInsights/workspaces", + "apiVersion": "2020-10-01", + "name": "[variables('logAnalyticsWorkspaceName')]", + "location": "[parameters('location')]", + "properties": { + "sku": { + "name": "PerGB2018" + }, + "retentionInDays": 30 + } + }, + { + "type": "Microsoft.Network/privateDnsZones", + "apiVersion": "2020-06-01", + "name": "[variables('acrPrivateDnsZonesName')]", + "location": "global", + "comments": "Enabling Azure Container Registry Private Link on vnet.", + "properties": {}, + "resources": [ + { + "type": "virtualNetworkLinks", + "apiVersion": "2020-06-01", + "name": "[concat('to_', variables('vnetName'))]", + "location": "global", + "comments": "Enabling cluster vnet private zone DNS lookup - used by cluster vnet for direct DNS queries (ones not proxied via the hub).", + "dependsOn": [ + "[resourceId('Microsoft.Network/privateDnsZones', variables('acrPrivateDnsZonesName'))]" + ], + "properties": { + "virtualNetwork": { + "id": "[parameters('targetVnetResourceId')]" + }, + "registrationEnabled": false + } + } + ] + }, + { + "type": "Microsoft.ContainerRegistry/registries", + "apiVersion": "2020-11-01-preview", + "name": "[variables('defaultAcrName')]", + "location": "[parameters('location')]", + "sku": { + "name": "Premium" + }, + "properties": { + "adminUserEnabled": false, + "networkRuleSet": { + "defaultAction": "Deny", + "virtualNetworkRules": [], + "ipRules": [] + }, + "policies": { + "quarantinePolicy": { + "status": "disabled" + }, + "trustPolicy": { + "type": "Notary", + "status": "disabled" + }, + "retentionPolicy": { + "days": 15, + "status": "enabled" + } + }, + "publicNetworkAccess": "Disabled", + "encryption": { + "status": "disabled" + }, + "dataEndpointEnabled": true, + "networkRuleBypassOptions": "AzureServices", + "zoneRedundancy": "Disabled" // This Preview feature only supports three regions at this time, and eastus2's paired region (centralus), does not support this. So disabling for now. + }, + "resources": [ + { + "type": "replications", + "apiVersion": "2020-11-01-preview", + "name": "[parameters('geoRedundancyLocation')]", + "location": "[parameters('geoRedundancyLocation')]", + "dependsOn": [ + "[variables('defaultAcrName')]" + ], + "properties": {} + }, + { + "type": "providers/diagnosticSettings", + "apiVersion": "2017-05-01-preview", + "name": "Microsoft.Insights/default", + "dependsOn": [ + "[resourceId('Microsoft.ContainerRegistry/registries', variables('defaultAcrName'))]", + "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]" + ], + "properties": { + "workspaceId": "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]", + "metrics": [ + { + "timeGrain": "PT1M", + "category": "AllMetrics", + "enabled": true + } + ], + "logs": [ + { + "category": "ContainerRegistryRepositoryEvents", + "enabled": true + }, + { + "category": "ContainerRegistryLoginEvents", + "enabled": true + } + ] + } + } + ] + }, + { + "type": "Microsoft.Network/privateEndpoints", + "apiVersion": "2020-11-01", + "name": "[concat('acr_to_', variables('vnetName'))]", + "location": "[parameters('location')]", + "dependsOn": [ + "[resourceId('Microsoft.ContainerRegistry/registries/replications', variables('defaultAcrName'), parameters('geoRedundancyLocation'))]" + ], + "properties": { + "subnet": { + "id": "[variables('vnetAcrPrivateEndpointSubnetResourceId')]" + }, + "privateLinkServiceConnections": [ + { + "name": "nodepools", + "properties": { + "privateLinkServiceId": "[resourceId('Microsoft.ContainerRegistry/registries', variables('defaultAcrName'))]", + "groupIds": [ + "registry" + ] + } + } + ] + }, + "resources": [ + { + "type": "privateDnsZoneGroups", + "apiVersion": "2020-11-01", + "name": "default", + "location": "[parameters('location')]", + "dependsOn": [ + "[resourceId('Microsoft.Network/privateEndpoints', concat('acr_to_', variables('vnetName')))]", + "[resourceId('Microsoft.Network/privateDnsZones', variables('acrPrivateDnsZonesName'))]" + ], + "properties": { + "privateDnsZoneConfigs": [ + { + "name": "privatelink-azurecr-io", + "properties": { + "privateDnsZoneId": "[resourceId('Microsoft.Network/privateDnsZones', variables('acrPrivateDnsZonesName'))]" + } + } + ] + } + } + ] + } + ], + "outputs": { + "containerRegistryName": { + "type": "string", + "value": "[variables('defaultAcrName')]" + } + } +} diff --git a/azuredeploy.parameters.prod.json b/azuredeploy.parameters.prod.json index 6563bdb0..c1f4ff00 100644 --- a/azuredeploy.parameters.prod.json +++ b/azuredeploy.parameters.prod.json @@ -5,9 +5,6 @@ "location": { "value": "eastus2" }, - "geoRedundancyLocation": { - "value": "centralus" - }, "targetVnetResourceId": { "value": "/subscriptions/[subscription id]/resourceGroups/rg-enterprise-networking-spokes/providers/Microsoft.Network/virtualNetworks/vnet-hub-spoke-BU0001A0008-00" }, @@ -31,6 +28,9 @@ }, "domainName": { "value": "[the value of DOMAIN_NAME_AKS_BASELINE (e.g. contoso.com)]" + }, + "gitOpsBootstrappingRepoHttpsUrl": { + "value": "https://github.com/mspnp/aks-baseline" } } } diff --git a/cluster-manifests/README.md b/cluster-manifests/README.md index 963f95f0..04eaf45a 100644 --- a/cluster-manifests/README.md +++ b/cluster-manifests/README.md @@ -1,6 +1,6 @@ # Cluster Baseline Configuration Files (GitOps) -> Note: This is part of the Azure Kubernetes Service (AKS) Baseline Cluster reference implementation. For more information check out the [readme file in the root](../README.md). +> Note: This is part of the Azure Kubernetes Service (AKS) Baseline cluster reference implementation. For more information check out the [readme file in the root](../README.md). This is the root of the GitOps configuration directory. These Kubernetes object files are expected to be deployed via our in-cluster Flux operator. They are our AKS cluster's baseline configurations. Generally speaking, they are workload agnostic and tend to all cluster-wide configuration concerns. @@ -10,7 +10,6 @@ This is the root of the GitOps configuration directory. These Kubernetes object * Kubernetes RBAC Role Assignments (cluster and namespace) to Azure AD Groups. _Optional_ * [Kured](#kured) * Ingress Network Policy -* Flux (self-managing) * Azure Monitor Prometheus Scraping * Azure AD Pod Identity diff --git a/cluster-manifests/cluster-baseline-settings/aad-pod-identity.yaml b/cluster-manifests/cluster-baseline-settings/aad-pod-identity.yaml index 9a0e746c..5f47d6a8 100644 --- a/cluster-manifests/cluster-baseline-settings/aad-pod-identity.yaml +++ b/cluster-manifests/cluster-baseline-settings/aad-pod-identity.yaml @@ -643,4 +643,13 @@ metadata: namespace: kube-system spec: podLabels: - kubernetes.azure.com/managedby: aks \ No newline at end of file + kubernetes.azure.com/managedby: aks +--- +apiVersion: aadpodidentity.k8s.io/v1 +kind: AzurePodIdentityException +metadata: + name: flux-extension-exception + namespace: flux-system +spec: + podLabels: + app.kubernetes.io/component: fluxconfig-agent \ No newline at end of file diff --git a/cluster-manifests/cluster-baseline-settings/flux.yaml b/cluster-manifests/cluster-baseline-settings/flux.yaml deleted file mode 100644 index 10344dc6..00000000 --- a/cluster-manifests/cluster-baseline-settings/flux.yaml +++ /dev/null @@ -1,183 +0,0 @@ ---- -apiVersion: v1 -kind: ServiceAccount -metadata: - labels: - app.kubernetes.io/name: flux - name: flux - namespace: cluster-baseline-settings ---- -kind: ClusterRole -apiVersion: rbac.authorization.k8s.io/v1 -metadata: - name: flux - labels: - app.kubernetes.io/name: flux -rules: - - apiGroups: ['*'] - resources: ['*'] - verbs: ['*'] - - nonResourceURLs: ['*'] - verbs: ['*'] ---- -kind: ClusterRoleBinding -apiVersion: rbac.authorization.k8s.io/v1 -metadata: - name: flux - labels: - app.kubernetes.io/name: flux -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: ClusterRole - name: flux -subjects: - - kind: ServiceAccount - name: flux - namespace: cluster-baseline-settings ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: flux - namespace: cluster-baseline-settings -spec: - replicas: 1 - selector: - matchLabels: - app.kubernetes.io/name: flux - strategy: - type: Recreate - template: - metadata: - annotations: - prometheus.io/port: "3031" - labels: - app.kubernetes.io/name: flux - spec: - nodeSelector: - kubernetes.io/os: linux - agentpool: npuser01 - serviceAccountName: flux - volumes: - - name: git-key - secret: - secretName: flux-git-deploy - defaultMode: 0400 - containers: - - name: flux - # PRODUCTION READINESS CHANGE REQUIRED - # This image should be sourced from a non-public container registry, such as the - # one deployed along side of this reference implementation. - # az acr import --source docker.io/fluxcd/flux:1.21.1 -n - # and then set this to - # image: .azurecr.io/fluxcd/flux:1.21.1 - image: docker.io/fluxcd/flux:1.21.1 - imagePullPolicy: IfNotPresent - securityContext: - capabilities: - drop: - - ALL - allowPrivilegeEscalation: false - # create folder in the root fs when cloning repos - readOnlyRootFilesystem: false - # access to root folder like /.kube/config - runAsNonRoot: false - volumeMounts: - - name: git-key - mountPath: /etc/fluxd/ssh - readOnly: true - resources: - requests: - cpu: 50m - memory: 64Mi - ports: - - containerPort: 3030 - livenessProbe: - httpGet: - port: 3030 - path: /api/flux/v6/identity.pub - initialDelaySeconds: 5 - timeoutSeconds: 5 - readinessProbe: - httpGet: - port: 3030 - path: /api/flux/v6/identity.pub - initialDelaySeconds: 5 - timeoutSeconds: 5 - args: - - --git-url=https://github.com/mspnp/aks-secure-baseline.git - - --git-branch=main - - --git-path=cluster-manifests - # this configuration prevents flux from syncing changes from your cluster to the git repo. If two way sync is required, please take a look at https://docs.fluxcd.io/en/1.19.0/tutorials/get-started/#giving-write-access - - --git-readonly - - --sync-state=secret - - --listen-metrics=:3031 - - --git-timeout=5m - - --registry-disable-scanning=true ---- -# This secret is ok to be initialized as empty since Flux annotates the -# Kubernetes Secret object with flux.weave.works/sync-hwm: -# as a way to store the latest commit applied to the cluster and later on -# compare with to confirm wether it is in sync or not. -apiVersion: v1 -kind: Secret -metadata: - name: flux-git-deploy - namespace: cluster-baseline-settings -type: Opaque ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: memcached - namespace: cluster-baseline-settings -spec: - replicas: 1 - selector: - matchLabels: - app.kubernetes.io/name: memcached - template: - metadata: - labels: - app.kubernetes.io/name: memcached - spec: - nodeSelector: - kubernetes.io/os: linux - agentpool: npuser01 - containers: - - name: memcached - # PRODUCTION READINESS CHANGE REQUIRED - # This image should be sourced from a non-public container registry, such as the - # one deployed along side of this reference implementation. - # az acr import --source docker.io/library/memcached:1.5.20 -n - # and then set this to - # image: .azurecr.io/library/memcached:1.5.20 - image: docker.io/library/memcached:1.5.20 - imagePullPolicy: IfNotPresent - resources: - requests: - memory: 512Mi - args: - - -m 512 - - -I 5m # Maximum size for one item - - -p 11211 # Default port - # - -vv # Uncomment to get logs of each request and response. - ports: - - name: clients - containerPort: 11211 - securityContext: - runAsUser: 11211 - runAsGroup: 11211 - allowPrivilegeEscalation: false ---- -apiVersion: v1 -kind: Service -metadata: - name: memcached - namespace: cluster-baseline-settings -spec: - ports: - - name: memcached - port: 11211 - selector: - app.kubernetes.io/name: memcached diff --git a/cluster-manifests/cluster-baseline-settings/kured.yaml b/cluster-manifests/cluster-baseline-settings/kured.yaml index 1f479c34..533d2333 100644 --- a/cluster-manifests/cluster-baseline-settings/kured.yaml +++ b/cluster-manifests/cluster-baseline-settings/kured.yaml @@ -1,4 +1,4 @@ -# Source: https://github.com/weaveworks/kured/releases/download/1.7.0/kured-1.7.0-dockerhub.yaml +# Source: https://github.com/weaveworks/kured/releases/download/1.9.0/kured-1.9.0-dockerhub.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: @@ -106,10 +106,10 @@ spec: # PRODUCTION READINESS CHANGE REQUIRED # This image should be sourced from a non-public container registry, such as the # one deployed along side of this reference implementation. - # az acr import --source docker.io/weaveworks/kured:1.7.0 -n + # az acr import --source docker.io/weaveworks/kured:1.9.0 -n # and then set this to - # image: .azurecr.io/weaveworks/kured:1.7.0 - image: docker.io/weaveworks/kured:1.7.0 + # image: .azurecr.io/weaveworks/kured:1.9.0 + image: docker.io/weaveworks/kured:1.9.0 imagePullPolicy: IfNotPresent resources: limits: @@ -138,6 +138,7 @@ spec: # - --lock-ttl=0 # - --prometheus-url=http://prometheus.monitoring.svc.cluster.local # - --alert-filter-regexp=^RebootRequired$ +# - --alert-firing-only=false # - --reboot-sentinel=/var/run/reboot-required # - --prefer-no-schedule-taint="" # - --reboot-sentinel-command="" @@ -151,8 +152,10 @@ spec: # - --blocking-pod-selector=name=temperamental # - --blocking-pod-selector=... # - --reboot-days=sun,mon,tue,wed,thu,fri,sat +# - --reboot-delay=90s # - --start-time=0:00 # - --end-time=23:59:59 # - --time-zone=UTC # - --annotate-nodes=false # - --lock-release-delay=30m +# - --log-format=text diff --git a/cluster-stamp.json b/cluster-stamp.json index a0348493..f4f009e2 100644 --- a/cluster-stamp.json +++ b/cluster-stamp.json @@ -70,38 +70,8 @@ "description": "AKS Service, Node Pool, and supporting services (KeyVault, App Gateway, etc) region. This needs to be the same region as the vnet provided in these parameters." } }, - "geoRedundancyLocation": { - "defaultValue": "centralus", - "type": "string", - "allowedValues": [ - "australiasoutheast", - "canadaeast", - "eastus2", - "westus", - "centralus", - "westcentralus", - "francesouth", - "germanynorth", - "westeurope", - "ukwest", - "northeurope", - "japanwest", - "southafricawest", - "northcentralus", - "eastasia", - "eastus", - "westus2", - "francecentral", - "uksouth", - "japaneast", - "southeastasia" - ], - "metadata": { - "description": "For Azure resources that support native geo-redunancy, provide the location the redundant service will have its secondary. Should be different than the location parameter and ideally should be a paired region - https://docs.microsoft.com/azure/best-practices-availability-paired-regions. This region does not need to support availability zones." - } - }, "kubernetesVersion": { - "defaultValue": "1.22.2", + "defaultValue": "1.22.4", "type": "string" }, "domainName": { @@ -110,6 +80,22 @@ "metadata": { "description": "Domain name to use for App Gateway and AKS ingress." } + }, + "gitOpsBootstrappingRepoHttpsUrl": { + "type": "string", + "defaultValue": "https://github.com/mspnp/aks-baseline", + "minLength": 9, + "metadata": { + "description": "Your cluster will be bootstrapped from this git repo." + } + }, + "gitOpsBootstrappingRepoBranch": { + "type": "string", + "defaultValue": "main", + "minLength": 1, + "metadata": { + "description": "You cluster will be bootstrapped from this branch in the identifed git repo." + } } }, "variables": { @@ -138,7 +124,6 @@ "agwName": "[concat('apw-', variables('clusterName'))]", - "acrPrivateDnsZonesName": "privatelink.azurecr.io", "akvPrivateDnsZonesName": "privatelink.vaultcore.azure.net", "clusterControlPlaneIdentityName": "[concat('mi-', variables('clusterName'), '-controlplane')]", @@ -240,8 +225,7 @@ "apiVersion": "2017-05-01-preview", "name": "Microsoft.Insights/default", "dependsOn": [ - "[resourceId('Microsoft.KeyVault/vaults', variables('keyVaultName'))]", - "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]" + "[resourceId('Microsoft.KeyVault/vaults', variables('keyVaultName'))]" ], "properties": { "workspaceId": "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]", @@ -261,7 +245,7 @@ }, { "type": "providers/roleAssignments", - "apiVersion": "2018-09-01-preview", + "apiVersion": "2020-04-01-preview", "name": "[concat('Microsoft.Authorization/', guid(resourceGroup().id, 'mi-appgateway-frontend', variables('keyVaultSecretsUserRole')))]", "dependsOn": [ "[resourceId('Microsoft.KeyVault/vaults', variables('keyVaultName'))]", @@ -276,7 +260,7 @@ }, { "type": "providers/roleAssignments", - "apiVersion": "2018-09-01-preview", + "apiVersion": "2020-04-01-preview", "name": "[concat('Microsoft.Authorization/', guid(resourceGroup().id, 'mi-appgateway-frontend', variables('keyVaultReader')))]", "dependsOn": [ "[resourceId('Microsoft.KeyVault/vaults', variables('keyVaultName'))]", @@ -291,7 +275,7 @@ }, { "type": "providers/roleAssignments", - "apiVersion": "2018-09-01-preview", + "apiVersion": "2020-04-01-preview", "name": "[concat('Microsoft.Authorization/', guid(resourceGroup().id, 'podmi-ingress-controller', variables('keyVaultSecretsUserRole')))]", "dependsOn": [ "[resourceId('Microsoft.KeyVault/vaults', variables('keyVaultName'))]", @@ -306,7 +290,7 @@ }, { "type": "providers/roleAssignments", - "apiVersion": "2018-09-01-preview", + "apiVersion": "2020-04-01-preview", "name": "[concat('Microsoft.Authorization/', guid(resourceGroup().id, 'podmi-ingress-controller', variables('keyVaultReader')))]", "dependsOn": [ "[resourceId('Microsoft.KeyVault/vaults', variables('keyVaultName'))]", @@ -368,32 +352,6 @@ } ] }, -{ - "type": "Microsoft.Network/privateDnsZones", - "apiVersion": "2020-06-01", - "name": "[variables('acrPrivateDnsZonesName')]", - "location": "global", - "comments": "Enabling Azure Container Registry Private Link on vnet.", - "properties": {}, - "resources": [ - { - "type": "virtualNetworkLinks", - "apiVersion": "2020-06-01", - "name": "[concat('to_', variables('vnetName'))]", - "location": "global", - "comments": "Enabling cluster vnet private zone DNS lookup - used by cluster vnet for direct DNS queries (ones not proxied via the hub).", - "dependsOn": [ - "[resourceId('Microsoft.Network/privateDnsZones', variables('acrPrivateDnsZonesName'))]" - ], - "properties": { - "virtualNetwork": { - "id": "[parameters('targetVnetResourceId')]" - }, - "registrationEnabled": false - } - } - ] - }, { "type": "Microsoft.Network/privateDnsZones", "apiVersion": "2018-09-01", @@ -641,8 +599,7 @@ "apiVersion": "2017-05-01-preview", "name": "Microsoft.Insights/default", "dependsOn": [ - "[resourceId('Microsoft.Network/applicationGateways', variables('agwName'))]", - "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]" + "[resourceId('Microsoft.Network/applicationGateways', variables('agwName'))]" ], "properties": { "workspaceId": "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]", @@ -722,7 +679,7 @@ "resources": [ { "type": "Microsoft.Authorization/roleAssignments", - "apiVersion": "2018-09-01-preview", + "apiVersion": "2020-04-01-preview", "name": "[guid(resourceGroup().id)]", "comments": "It is required to grant the AKS cluster with Virtual Machine Contributor role permissions over the cluster infrastructure resource group to work with Managed Identities and aad-pod-identity. Otherwise MIC component fails while attempting to update MSI on VMSS cluster nodes", "properties": { @@ -736,63 +693,28 @@ } }, { - "type": "Microsoft.OperationalInsights/workspaces", + "type": "Microsoft.OperationalInsights/workspaces/savedSearches", "apiVersion": "2020-08-01", - "name": "[variables('logAnalyticsWorkspaceName')]", - "location": "[parameters('location')]", + "name": "[concat(variables('logAnalyticsWorkspaceName'), '/AllPrometheus')]", "properties": { - "sku": { - "name": "PerGB2018" - }, - "retentionInDays": 30 - }, - "resources": [ - { - "type": "savedSearches", - "apiVersion": "2020-08-01", - "name": "AllPrometheus", - "dependsOn": [ - "[concat('Microsoft.OperationalInsights/workspaces/', variables('logAnalyticsWorkspaceName'))]" - ], - "properties": { - "eTag": "*", - "category": "Prometheus", - "displayName": "All collected Prometheus information", - "query": "InsightsMetrics | where Namespace == \"prometheus\"", - "version": 1 - } - }, - { - "type": "savedSearches", - "apiVersion": "2020-08-01", - "name": "ForbiddenReponsesOnIngress", - "dependsOn": [ - "[concat('Microsoft.OperationalInsights/workspaces/', variables('logAnalyticsWorkspaceName'))]" - ], - "properties": { - "eTag": "*", - "category": "Prometheus", - "displayName": "Increase number of forbidden response on the Ingress Controller", - "query": "let value = toscalar(InsightsMetrics | where Namespace == \"prometheus\" and Name == \"traefik_entrypoint_requests_total\" | where parse_json(Tags).code == 403 | summarize Value = avg(Val) by bin(TimeGenerated, 5m) | summarize min = min(Value)); InsightsMetrics | where Namespace == \"prometheus\" and Name == \"traefik_entrypoint_requests_total\" | where parse_json(Tags).code == 403 | summarize AggregatedValue = avg(Val)-value by bin(TimeGenerated, 5m) | order by TimeGenerated | render barchart", - "version": 1 - } - }, - { - "type": "savedSearches", - "apiVersion": "2020-08-01", - "name": "NodeRebootRequested", - "dependsOn": [ - "[concat('Microsoft.OperationalInsights/workspaces/', variables('logAnalyticsWorkspaceName'))]" - ], - "properties": { - "eTag": "*", - "category": "Prometheus", - "displayName": "Nodes reboot required by kured", - "query": "InsightsMetrics | where Namespace == \"prometheus\" and Name == \"kured_reboot_required\" | where Val > 0", - "version": 1 - } - } - ] + "eTag": "*", + "category": "Prometheus", + "displayName": "All collected Prometheus information", + "query": "InsightsMetrics | where Namespace == \"prometheus\"", + "version": 1 + } + }, + { + "type": "Microsoft.OperationalInsights/workspaces/savedSearches", + "apiVersion": "2020-08-01", + "name": "[concat(variables('logAnalyticsWorkspaceName'), '/NodeRebootRequested')]", + "properties": { + "eTag": "*", + "category": "Prometheus", + "displayName": "Nodes reboot required by kured", + "query": "InsightsMetrics | where Namespace == \"prometheus\" and Name == \"kured_reboot_required\" | where Val > 0", + "version": 1 + } }, { "name": "PodFailedScheduledQuery", @@ -800,8 +722,7 @@ "apiVersion": "2018-04-16", "location": "[parameters('location')]", "dependsOn": [ - "[resourceId('Microsoft.OperationsManagement/solutions',variables('containerInsightsSolutionName'))]", - "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]" + "[resourceId('Microsoft.OperationsManagement/solutions',variables('containerInsightsSolutionName'))]" ], "properties": { "description": "Alert on pod Failed phase.", @@ -864,9 +785,6 @@ "apiVersion": "2015-11-01-preview", "name": "[variables('containerInsightsSolutionName')]", "location": "[parameters('location')]", - "dependsOn": [ - "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]" - ], "properties": { "workspaceResourceId": "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]" }, @@ -882,9 +800,6 @@ "apiVersion": "2015-11-01-preview", "name": "[concat('KeyVaultAnalytics(', variables('logAnalyticsWorkspaceName'),')')]", "location": "[parameters('location')]", - "dependsOn": [ - "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]" - ], "properties": { "workspaceResourceId": "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]" }, @@ -896,147 +811,23 @@ } }, { - "type": "Microsoft.ContainerRegistry/registries", - "apiVersion": "2020-11-01-preview", - "name": "[variables('defaultAcrName')]", - "location": "[parameters('location')]", - "sku": { - "name": "Premium" - }, - "properties": { - "adminUserEnabled": false, - "networkRuleSet": { - "defaultAction": "Deny", - "virtualNetworkRules": [], - "ipRules": [] - }, - "policies": { - "quarantinePolicy": { - "status": "disabled" - }, - "trustPolicy": { - "type": "Notary", - "status": "disabled" - }, - "retentionPolicy": { - "days": 15, - "status": "enabled" - } - }, - "publicNetworkAccess": "Disabled", - "encryption": { - "status": "disabled" - }, - "dataEndpointEnabled": true, - "networkRuleBypassOptions": "AzureServices", - "zoneRedundancy": "Disabled" // This Preview feature only supports three regions at this time, and eastus2's paired region (centralus), does not support this. So disabling for now. - }, - "resources": [ - { - "name": "[concat('Microsoft.Authorization/', guid(resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName')), variables('acrPullRole')))]", - "type": "providers/roleAssignments", - "apiVersion": "2020-04-01-preview", - "dependsOn": [ - "[resourceId('Microsoft.ContainerRegistry/registries', variables('defaultAcrName'))]", - "[resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName'))]" - ], - "properties": { - "roleDefinitionId": "[variables('acrPullRole')]", - "principalId": "[reference(resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName')), '2020-12-01').identityProfile.kubeletidentity.objectId]", - "principalType": "ServicePrincipal" - } - }, - { - "type": "replications", - "apiVersion": "2019-05-01", - "name": "[parameters('geoRedundancyLocation')]", - "location": "[parameters('geoRedundancyLocation')]", - "dependsOn": [ - "[variables('defaultAcrName')]" - ], - "properties": {} - }, - { - "type": "providers/diagnosticSettings", - "apiVersion": "2017-05-01-preview", - "name": "Microsoft.Insights/default", - "dependsOn": [ - "[resourceId('Microsoft.ContainerRegistry/registries', variables('defaultAcrName'))]", - "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]" - ], - "properties": { - "workspaceId": "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]", - "metrics": [ - { - "timeGrain": "PT1M", - "category": "AllMetrics", - "enabled": true - } - ], - "logs": [ - { - "category": "ContainerRegistryRepositoryEvents", - "enabled": true - }, - { - "category": "ContainerRegistryLoginEvents", - "enabled": true - } - ] - } - } - ] - }, - { - "type": "Microsoft.Network/privateEndpoints", - "apiVersion": "2020-05-01", - "name": "nodepool-to-acr", - "location": "[parameters('location')]", + "type": "Microsoft.Authorization/roleAssignments", + "apiVersion": "2020-04-01-preview", + "name": "[guid(resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName')), variables('acrPullRole'))]", "dependsOn": [ - "[resourceId('Microsoft.ContainerRegistry/registries/replications', variables('defaultAcrName'), parameters('geoRedundancyLocation'))]" + "[resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName'))]" ], + "scope": "[resourceId('Microsoft.ContainerRegistry/registries', variables('defaultAcrName'))]", "properties": { - "subnet": { - "id": "[variables('vnetNodePoolSubnetResourceId')]" - }, - "privateLinkServiceConnections": [ - { - "name": "nodepools", - "properties": { - "privateLinkServiceId": "[resourceId('Microsoft.ContainerRegistry/registries', variables('defaultAcrName'))]", - "groupIds": [ - "registry" - ] - } - } - ] - }, - "resources": [ - { - "type": "privateDnsZoneGroups", - "apiVersion": "2020-05-01", - "name": "default", - "location": "[parameters('location')]", - "dependsOn": [ - "[resourceId('Microsoft.Network/privateEndpoints', 'nodepool-to-acr')]", - "[resourceId('Microsoft.Network/privateDnsZones', variables('acrPrivateDnsZonesName'))]" - ], - "properties": { - "privateDnsZoneConfigs": [ - { - "name": "privatelink-azurecr-io", - "properties": { - "privateDnsZoneId": "[resourceId('Microsoft.Network/privateDnsZones', variables('acrPrivateDnsZonesName'))]" - } - } - ] - } - } - ] + "roleDefinitionId": "[variables('acrPullRole')]", + "description": "Allows AKS to pull container images from this ACR instance.", + "principalId": "[reference(resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName')), '2020-12-01').identityProfile.kubeletidentity.objectId]", + "principalType": "ServicePrincipal" + } }, { "type": "Microsoft.ContainerService/managedClusters", - "apiVersion": "2021-02-01", + "apiVersion": "2021-09-01", "name": "[variables('clusterName')]", "location": "[parameters('location')]", "tags": { @@ -1132,6 +923,9 @@ "logAnalyticsWorkspaceResourceId": "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]" } }, + /*"extensionManager": { + "enabled": true + },*/ "aciConnectorLinux": { "enabled": false }, @@ -1209,6 +1003,78 @@ "tier": "Paid" }, "resources": [ + { + "type": "providers/extensions", + "apiVersion": "2021-09-01", + "name": "Microsoft.KubernetesConfiguration/flux", + "comments": "Ensures that flux add-on (extension) is installed.", + "dependsOn": [ + "[resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName'))]", + "[resourceId('Microsoft.ContainerRegistry/registries/providers/roleAssignments', variables('defaultAcrName'), 'Microsoft.Authorization', guid(resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName')), variables('acrPullRole')))]" + ], + "properties": { + "extensionType": "Microsoft.Flux", + "autoUpgradeMinorVersion": true, + "releaseTrain": "Stable", + "scope": { + "cluster": { + "releaseNamespace": "flux-system", + "configurationSettings": { + "helm-controller.enabled": "false", + "source-controller.enabled": "true", + "kustomize-controller.enabled": "true", + "notification-controller.enabled": "false", + "image-automation-controller.enabled": "false", + "image-reflector-controller.enabled": "false" + }, + "configurationProtectedSettings": {} + } + } + + } + }, + { + "type": "providers/fluxConfigurations", + "apiVersion": "2021-11-01-preview", + "name": "Microsoft.KubernetesConfiguration/bootstrap", + "comments": "Bootstraps your cluster using content from your repo.", + "dependsOn": [ + "[resourceId('Microsoft.ContainerService/managedClusters/providers/extensions', variables('clusterName'), 'Microsoft.KubernetesConfiguration', 'flux')]", + "[resourceId('Microsoft.ContainerRegistry/registries/providers/roleAssignments', variables('defaultAcrName'), 'Microsoft.Authorization', guid(resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName')), variables('acrPullRole')))]", + "[resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName'))]" + ], + "properties": { + "scope": "Cluster", + "namespace": "flux-system", + "sourceKind": "GitRepository", + "gitRepository": { + "url": "[parameters('gitOpsBootstrappingRepoHttpsUrl')]", + "timeoutInSeconds": 180, + "syncIntervalInSeconds": 300, + "repositoryRef": { + "branch": "[parameters('gitOpsBootstrappingRepoBranch')]", + "tag": "[null()]", + "semver": "[null()]", + "commit": "[null()]" + }, + "sshKnownHosts": "", + "httpsUser": "[null()]", + "httpsCAFile": "[null()]", + "localAuthRef": "[null()]" + }, + "kustomizations": { + "unified": { + "path": "./cluster-manifests", + "dependsOn": [], + "timeoutInSeconds": 300, + "syncIntervalInSeconds": 300, + "retryIntervalInSeconds": "[null()]", + "prune": true, + "force": false + } + } + } + }, { "type": "providers/roleAssignments", "apiVersion": "2020-04-01-preview", @@ -1228,8 +1094,7 @@ "apiVersion": "2017-05-01-preview", "name": "Microsoft.Insights/default", "dependsOn": [ - "[resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName'))]", - "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]" + "[resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName'))]" ], "properties": { "workspaceId": "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]", @@ -1274,8 +1139,7 @@ "apiVersion": "2017-05-01-preview", "name": "Microsoft.Insights/default", "dependsOn": [ - "[resourceId('Microsoft.EventGrid/systemTopics', variables('clusterName'))]", - "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]" + "[resourceId('Microsoft.EventGrid/systemTopics', variables('clusterName'))]" ], "properties": { "workspaceId": "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsWorkspaceName'))]", @@ -1906,7 +1770,7 @@ { "condition": "[variables('isUsingAzureRBACasKubernetesRBAC')]", "type": "Microsoft.Authorization/roleAssignments", - "apiVersion": "2018-09-01-preview", + "apiVersion": "2020-04-01-preview", "name": "[guid('aad-admin-group', resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName')), parameters('clusterAdminAadGroupObjectId'))]", "dependsOn": [ "[resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName'))]" @@ -1914,6 +1778,7 @@ "scope": "[concat('Microsoft.ContainerService/managedClusters/', variables('clusterName'))]", "properties": { "roleDefinitionId": "[concat('/subscriptions/', subscription().subscriptionId, '/providers/Microsoft.Authorization/roleDefinitions/', variables('clusterAdminRoleId'))]", + "description": "Members of this group are cluster admins of this cluster.", "principalId": "[parameters('clusterAdminAadGroupObjectId')]", "principalType": "Group" } @@ -1921,7 +1786,7 @@ { "condition": "[variables('isUsingAzureRBACasKubernetesRBAC')]", "type": "Microsoft.Authorization/roleAssignments", - "apiVersion": "2018-09-01-preview", + "apiVersion": "2020-04-01-preview", "name": "[guid('aad-admin-group-sc', resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName')), parameters('clusterAdminAadGroupObjectId'))]", "dependsOn": [ "[resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName'))]" @@ -1929,6 +1794,7 @@ "scope": "[concat('Microsoft.ContainerService/managedClusters/', variables('clusterName'))]", "properties": { "roleDefinitionId": "[concat('/subscriptions/', subscription().subscriptionId, '/providers/Microsoft.Authorization/roleDefinitions/', variables('serviceClusterUserRoleId'))]", + "description": "Members of this group are cluster users of this cluster.", "principalId": "[parameters('clusterAdminAadGroupObjectId')]", "principalType": "Group" } @@ -1936,7 +1802,7 @@ { "condition": "[and(variables('isUsingAzureRBACasKubernetesRBAC'), not(equals(parameters('a0008NamespaceReaderAadGroupObjectId'), parameters('clusterAdminAadGroupObjectId'))))]", "type": "Microsoft.Authorization/roleAssignments", - "apiVersion": "2018-09-01-preview", + "apiVersion": "2020-04-01-preview", "name": "[guid('aad-a0008-reader-group', resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName')), parameters('a0008NamespaceReaderAadGroupObjectId'))]", "dependsOn": [ "[resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName'))]" @@ -1945,13 +1811,14 @@ "properties": { "roleDefinitionId": "[concat('/subscriptions/', subscription().subscriptionId, '/providers/Microsoft.Authorization/roleDefinitions/', variables('clusterReaderRoleId'))]", "principalId": "[parameters('a0008NamespaceReaderAadGroupObjectId')]", + "description": "Members of this group are cluster admins of the a0008 namespace in this cluster.", "principalType": "Group" } }, { "condition": "[and(variables('isUsingAzureRBACasKubernetesRBAC'), not(equals(parameters('a0008NamespaceReaderAadGroupObjectId'), parameters('clusterAdminAadGroupObjectId'))))]", "type": "Microsoft.Authorization/roleAssignments", - "apiVersion": "2018-09-01-preview", + "apiVersion": "2020-04-01-preview", "name": "[guid('aad-a0008-reader-group-sc', resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName')), parameters('a0008NamespaceReaderAadGroupObjectId'))]", "dependsOn": [ "[resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName'))]" @@ -1960,14 +1827,15 @@ "properties": { "roleDefinitionId": "[concat('/subscriptions/', subscription().subscriptionId, '/providers/Microsoft.Authorization/roleDefinitions/', variables('serviceClusterUserRoleId'))]", "principalId": "[parameters('a0008NamespaceReaderAadGroupObjectId')]", + "description": "Members of this group are cluster users of this cluster.", "principalType": "Group" } }, { "type": "Microsoft.ManagedIdentity/userAssignedIdentities/providers/roleAssignments", - "apiVersion": "2018-09-01-preview", + "apiVersion": "2020-04-01-preview", "name": "[concat('podmi-ingress-controller', '/Microsoft.Authorization/', guid(resourceGroup().id, 'podmi-ingress-controller', variables('managedIdentityOperatorRole')))]", - "comments": "Grant the AKS cluster with Manage Identity Operator role permissions over the managed identity used for the ingress controller. Allows it to be assigned to the underlying VMSS.", + "comments": "Grant the AKS cluster with Managed Identity Operator role permissions over the managed identity used for the ingress controller. Allows it to be assigned to the underlying VMSS.", "dependsOn": [ "[resourceId('Microsoft.ContainerService/managedClusters', variables('clusterName'))]" ], @@ -2083,7 +1951,8 @@ "kube-system", "gatekeeper-system", "azure-arc", - "cluster-baseline-settings" + "cluster-baseline-settings", + "flux-system" /* Flux add-on, not all containers have limits defined by Microsoft */ ] }, "effect": { @@ -2103,7 +1972,7 @@ "policyDefinitionId": "[variables('policyResourceIdEnforceImageSource')]", "parameters": { "allowedContainerImagesRegex": { - "value": "[concat(variables('defaultAcrName'),'.azurecr.io/.+$|mcr.microsoft.com/.+$|docker.io/fluxcd/flux.+$|docker.io/weaveworks/kured.+$|docker.io/library/.+$')]" + "value": "[concat(variables('defaultAcrName'),'.azurecr.io/.+$|mcr.microsoft.com/.+$|azurearcfork8s.azurecr.io/azurearcflux/images/stable/.+$|docker.io/weaveworks/kured.+$|docker.io/library/.+$')]" }, "excludedNamespaces": { "value": [ @@ -2135,10 +2004,6 @@ "keyVaultName": { "type": "string", "value": "[variables('keyVaultName')]" - }, - "containerRegistryName": { - "type": "string", - "value": "[variables('defaultAcrName')]" } } } diff --git a/github-workflow/README.md b/github-workflow/README.md index 947f8939..ad5e55a3 100644 --- a/github-workflow/README.md +++ b/github-workflow/README.md @@ -1,6 +1,6 @@ # GitHub Actions Workflow -> Note: This is part of the Azure Kubernetes Service (AKS) Baseline Cluster reference implementation. For more information check out the [readme file in the root](../README.md). +> Note: This is part of the Azure Kubernetes Service (AKS) Baseline cluster reference implementation. For more information check out the [readme file in the root](../README.md). This cluster, as with any workload, should be managed via an automated deployment pipeline. In this reference implementation we provide a "getting started" GitHub Action workflow file that you can reference to build your own. @@ -14,7 +14,7 @@ Secrets should not be stored in this file, but instead should be stored as part ## Workload -The workload is NOT part of this deployment. This is a deployment of the infrastructure only. Separation of infrastructure and workload is recommended as it allows you to have distinct lifecycle and operational concerns. +The workload is NOT part of this deployment. This is a deployment of the infrastructure only. Separation of infrastructure and workload is recommended as it allows you to have distinct lifecycle and operational concerns. ## Next Steps diff --git a/github-workflow/aks-deploy.properties.json b/github-workflow/aks-deploy.properties.json index 41b12c36..35757c29 100644 --- a/github-workflow/aks-deploy.properties.json +++ b/github-workflow/aks-deploy.properties.json @@ -1,6 +1,6 @@ { - "name": "Deploy AKS Baseline cluster stamp and Flux", - "description": "This workflow will deploy our cluster stamp in your own vnet and also Flux", + "name": "Deploy AKS Baseline cluster stamp", + "description": "This workflow will deploy our cluster stamp in your own vnet", "creator": "Microsoft Patterns & Practices", "iconName": "azure", "categories": ["AKS"], diff --git a/github-workflow/aks-deploy.yaml b/github-workflow/aks-deploy.yaml index 98d1a885..9280f95f 100644 --- a/github-workflow/aks-deploy.yaml +++ b/github-workflow/aks-deploy.yaml @@ -1,4 +1,4 @@ -# This workflow will deploy our cluster stamp, without the workload. +# This workflow will deploy our bootstrapped cluster stamp, without the workload. # # Follow the next steps to use this workflow: # @@ -8,10 +8,7 @@ # │   ├── workflows # │   │   └── aks-deploy.yaml # ├── cluster-manifests -# │   ├── a0008/* # │   ├── cluster-baseline-settings/* -# │   ├── kube-system/* -# │   ├── cluster-rbac.yaml # └── cluster-stamp.json # # 2. Ensure you have followed the prior sections before deploying this AKS cluster. This way, you will be capable of setting: @@ -23,7 +20,7 @@ # - APP_GATEWAY_LISTENER_CERTIFICATE_BASE64 The certificate data for app gateway TLS termination. It is base64. Ideally fetch this secret from a platform-managed secret store such as Azure KeyVault: https://github.com/marketplace/actions/azure-key-vault-get-secrets # - AKS_INGRESS_CONTROLLER_CERTIFICATE_BASE64 The Base64 encoded AKS Ingress Controller public certificate (as .crt or .cer) to be stored in Azure Key Vault as secret and referenced by Azure Application Gateway as a trusted root certificate. -name: Deploy AKS Baseline cluster stamp and Flux +name: Deploy AKS Baseline cluster stamp on: push: @@ -48,6 +45,7 @@ env: K8S_RBAC_AAD_A0008_READER_GROUP_OBJECTID: '' # The Azure AD group object ID that has readonly access to the a0008 namespace in the AKS cluster CLUSTER_AUTHORIZED_IP_RANGES: '[]' # By default, this deployment will allow unrestricted access to your cluster's API Server. You should limit access to the API Server to a set of well-known IP addresses (i.,e. your hub firewall IP, bastion subnet, build agents, or any other networks you'll administer the cluster from), and can do so by adding a CLUSTER_AUTHORIZED_IP_RANGES="['managementRange1', 'managementRange2', 'AzureFirewallIP/32']"" parameter. DOMAIN_NAME: '' # The domain name to use for App Gateway and AKS ingress. + BOOTSTRAPPING_REPO_HTTPS_URL: '' # The git https repo that will be used for bootstrapping your cluster jobs: deploy: name: Deploy AKS cluster and Flux @@ -83,12 +81,15 @@ jobs: clusterAuthorizedIPRanges=${{ env.CLUSTER_AUTHORIZED_IP_RANGES}} \ appGatewayListenerCertificate=${{ secrets.APP_GATEWAY_LISTENER_CERTIFICATE_BASE64 }} \ aksIngressControllerCertificate=${{ secrets.AKS_INGRESS_CONTROLLER_CERTIFICATE_BASE64 }} \ - domainName=${{ env.DOMAIN_NAME }} + domainName=${{ env.DOMAIN_NAME }} \ + gitOpsBootstrappingRepoHttpsUrl=${{ BOOTSTRAPPING_REPO_HTTPS_URL }} echo "::set-output name=name::$(az deployment group show --resource-group ${{ env.RESOURCE_GROUP }} -n cluster-stamp --query properties.outputs.aksClusterName.value -o tsv)" - azcliversion: 2.17.1 + azcliversion: 2.29.2 - # Set the AKS cluster context + # If you needed to do any post-deployment bootstrapping that was not already set up you can follow + # a pattern like the following. They don't do anything meaningful, but are left here as a quickstart + # guide for your own post-deployment, pipeline-based configuration - name: Set the AKS cluster context uses: Azure/aks-set-context@v1 if: github.event_name == 'push' @@ -97,12 +98,13 @@ jobs: cluster-name: ${{ steps.aks-cluster.outputs.name }} resource-group: ${{ env.RESOURCE_GROUP }} - # Create the cluster-baseline-settings namespace and deploy Flux into it - - name: Create the cluster-baseline-settings namespace and deploy Flux + # Apply a manifest file from this repo. Technically this manifest file does NOT need to be + # applied. It is a namespace manifest which was already deployed as part of the bootstrapping + # process. + - name: Create the cluster-baseline-settings namespace uses: Azure/k8s-deploy@v1 if: github.event_name == 'push' with: namespace: 'cluster-baseline-settings' manifests: | cluster-manifests/cluster-baseline-settings/ns-cluster-baseline-settings.yaml - cluster-manifests/cluster-baseline-settings/flux.yaml diff --git a/inner-loop-scripts/README.md b/inner-loop-scripts/README.md index 4374e3e8..952dc4ec 100644 --- a/inner-loop-scripts/README.md +++ b/inner-loop-scripts/README.md @@ -1,6 +1,6 @@ # Deploy Scripts -> Note: This is part of the Azure Kubernetes Service (AKS) Baseline Cluster reference implementation. For more information check out the [readme file in the root](../README.md). +> Note: This is part of the Azure Kubernetes Service (AKS) Baseline cluster reference implementation. For more information check out the [readme file in the root](../README.md). While this reference implementation was being developed we built out some inner-loop deployment scripts to help do rapid testing. They are included in this directory _for your reference_. They are not used as part of the [main README.md introduction/instruction](../README.md), but you can reference them for your own purposes. **They often are not functional, as they are rarely maintained.** diff --git a/networking/README.md b/networking/README.md index 5284376c..18c25702 100644 --- a/networking/README.md +++ b/networking/README.md @@ -1,20 +1,20 @@ # Networking Azure Resource Manager (ARM) Templates -> Note: This is part of the Azure Kubernetes Service (AKS) Baseline Cluster reference implementation. For more information check out the [readme file in the root](../README.md). +> Note: This is part of the Azure Kubernetes Service (AKS) Baseline cluster reference implementation. For more information check out the [readme file in the root](../README.md). These files are the ARM templates used in the deployment of this reference implementation. This reference implementation uses a standard hub-spoke model. ## Files -* [`hub-default.json`](./hub-default.json) is a file that defines a generic regional hub. All regional hubs can generally be considered a fork of this base template. -* [`hub-regionA.json`](./hub-regionA.json) is a file that defines a specific region's hub (for example, it might be named `hub-eastus2.json`). This is the long-lived template that defines this specific region's hub. +* [`hub-default.json`](./hub-default.json) is a file that defines a generic regional hub. All regional hubs can generally be considered a fork of this base template. +* [`hub-regionA.json`](./hub-regionA.json) is a file that defines a specific region's hub (for example, it might be named `hub-eastus2.json`). This is the long-lived template that defines this specific region's hub. * [`spoke-BU0001A0008.json`](./spoke-BU0001A0008.json) is a file that defines a specific spoke in the topology. A spoke, in our narrative, is create for each workload in a business unit, hence the naming pattern in the file name. Your organization will likely have its own standards for their hub-spoke implementation. Be sure to follow your organizational guidelines. ## Topology Details -See the [AKS baseline Network Topology](./topology.md) for specifics on how this hub-spoke model has its subnets defined and IP space allocation concerns accounted for. +See the [AKS Baseline Network Topology](./topology.md) for specifics on how this hub-spoke model has its subnets defined and IP space allocation concerns accounted for. ## See also diff --git a/networking/hub-default.json b/networking/hub-default.json index ef8cd043..d2906925 100644 --- a/networking/hub-default.json +++ b/networking/hub-default.json @@ -22,7 +22,7 @@ "southeastasia" ], "metadata": { - "description": "The hub's regional affinity. All resources tied to this hub will also be homed in this region. The network team maintains this approved regional list which is a subset of zones with Availability Zone support." + "description": "The hub's regional affinity. All resources tied to this hub will also be homed in this region. The network team maintains this approved regional list which is a subset of zones with Availability Zone support." } }, "hubVnetAddressSpace": { diff --git a/networking/hub-regionA.json b/networking/hub-regionA.json index 0d5d77ca..f5b1f8d2 100644 --- a/networking/hub-regionA.json +++ b/networking/hub-regionA.json @@ -29,7 +29,7 @@ "southeastasia" ], "metadata": { - "description": "The hub's regional affinity. All resources tied to this hub will also be homed in this region. The network team maintains this approved regional list which is a subset of zones with Availability Zone support." + "description": "The hub's regional affinity. All resources tied to this hub will also be homed in this region. The network team maintains this approved regional list which is a subset of zones with Availability Zone support." } }, "hubVnetAddressSpace": { @@ -618,7 +618,7 @@ }, { "ruleType": "ApplicationRule", - "name": "pull-flux-images", + "name": "flux-addon-runtime-requirements", "protocols": [ { "protocolType": "Https", @@ -628,11 +628,14 @@ "fqdnTags": [], "webCategories": [], "targetFqdns": [ - "*.docker.com", - "*.docker.io", - "docker.io", - "ghcr.io", - "github-production-container-registry.s3.amazonaws.com" + "[concat(parameters('location'), '.dp.kubernetesconfiguration.azure.com')]", + "mcr.microsoft.com", + "management.azure.com", + "login.microsoftonline.com", + "*.blob.core.windows.net", // required for the extension installer to download the helm chart install flux. This storage account is not predictable, but does look like eusreplstore196 for example. + "azurearcfork8s.azurecr.io", // required for a few of the images installed by the extension, + "*.docker.io", // Only required if you use the default bootstrapping manifests included in this repo. Kured is sourced from here by default. + "*.docker.com" // Only required if you use the default bootstrapping manifests included in this repo. Kured is sourced from here by default. ], "targetUrls": [], "terminateTLS": false, diff --git a/networking/topology.md b/networking/topology.md index 968804a4..3ddb321f 100644 --- a/networking/topology.md +++ b/networking/topology.md @@ -1,6 +1,6 @@ -# AKS baseline Network Topology +# AKS Baseline Network Topology -> Note: This is part of the Azure Kubernetes Service (AKS) Baseline Cluster reference implementation. For more information check out the [readme file in the root](../README.md). +> Note: This is part of the Azure Kubernetes Service (AKS) Baseline cluster reference implementation. For more information check out the [readme file in the root](../README.md). ## Hub VNet diff --git a/workload/readme.md b/workload/readme.md index ca1a568f..5d3cbcc4 100644 --- a/workload/readme.md +++ b/workload/readme.md @@ -1,13 +1,13 @@ # Workload -> Note: This is part of the Azure Kubernetes Service (AKS) Baseline Cluster reference implementation. For more information check out the [readme file in the root](../README.md). +> Note: This is part of the Azure Kubernetes Service (AKS) Baseline cluster reference implementation. For more information check out the [readme file in the root](../README.md). This reference implementation is focused on the infrastructure of a secure, baseline AKS cluster. The workload is not fully in scope. However, to demonstrate the concepts and configuration presented in this AKS cluster, a workload needed to be defined. ## Web Service -The AKS cluster, in our reference implementation, is here to serve as an application platform host for a web-facing application. In this case, the ASP.NET Core Hello World application is serving as that application. +The AKS cluster, in our reference implementation, is here to serve as an application platform host for a web-facing application. In this case, the ASP.NET Core Hello World application is serving as that application. ## Ingress -In this AKS cluster, we decided to do workload-level ingress. While ingress could be defined and managed at the cluster level, it's often more reasonable to define ingress as an extension of the workload. Allowing operational consistency between the workload and the ingress resource, especially in a multi-tenant AKS cluster. We are deploying Traefik as our ingress solution. +In this AKS cluster, we decided to do workload-level ingress. While ingress could be defined and managed at the cluster level, it's often more reasonable to define ingress as an extension of the workload. Allowing operational consistency between the workload and the ingress resource, especially in a multi-tenant AKS cluster. We are deploying Traefik as our ingress solution.