Skip to content

Commit

Permalink
Finalize TF/k8s config; infrastructure docs
Browse files Browse the repository at this point in the history
  • Loading branch information
tiagojsag committed Feb 28, 2022
1 parent eb4ff5d commit c11c1f5
Show file tree
Hide file tree
Showing 87 changed files with 253 additions and 50 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/deploy-to-kubernetes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ jobs:

- name: Add custom host data
run: |
sudo sh -c 'echo "127.0.0.1 ${{ secrets.EKS_HOST }}" >> /etc/hosts'
sudo sh -c 'echo "127.0.0.1 ${{ secrets.AKS_HOST }}" >> /etc/hosts'
- name: Install kubectl
run: |
Expand All @@ -82,7 +82,7 @@ jobs:
- name: Creating SSH tunnel
run: |
ssh -i ~/.ssh/bastion.key -o StrictHostKeyChecking=no -N -L 4433:${{ secrets.EKS_HOST }}:443 ${{ secrets.BASTION_USER }}@${{ secrets.BASTION_HOST }} -T &
ssh -i ~/.ssh/bastion.key -o StrictHostKeyChecking=no -N -L 4433:${{ secrets.AKS_HOST }}:443 ${{ secrets.BASTION_USER }}@${{ secrets.BASTION_HOST }} -T &
- name: Redeploy production pods
if: ${{ github.ref == 'refs/heads/main' }}
Expand Down
16 changes: 1 addition & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,21 +216,7 @@ in more detail in the [Infrastructure](#infrastructure) docs.

### Infrastructure

While the application can be deployed in any server configuration that supports the above-mentioned
[dependencies](#dependencies), this project includes a [Terraform](https://www.terraform.io/) project
in the `infrastructure` folder, that you can use to easily and quickly deploy it using
[Microsoft Azure](https://azure.microsoft.com/en-us/).

Deploying the included Terraform project is done in two steps:
- First, use Terraform to apply the `infrastructure/remote-state` folder, which will set up base
Azure resources, like a [Resource Group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/overview)
or a [Storage Account](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-overview)
to store the "main" Terraform remote state
- Apply the "main" `infrastructure` folder, which contains all the Azure
resources necessary to host a functioning [Azure AKS cluster](https://azure.microsoft.com/en-us/services/kubernetes-service/).
- Apply the "main" `kubernetes` folder, which contains all the
resources necessary to host a functioning Marxan application within the AKS
cluster provisioned above.
Infrastructure code and documentation can be found under `/infrastructure`

## Bugs

Expand Down
160 changes: 160 additions & 0 deletions infrastructure/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Infrastructure

While the application can be deployed in any server configuration that supports the application's
dependencies, this project includes a [Terraform](https://www.terraform.io/) project
that you can use to easily and quickly deploy it using
[Microsoft Azure](https://azure.microsoft.com/en-us/) and its [Kubernetes](https://kubernetes.io/)
managed service, [AKS](https://azure.microsoft.com/en-us/services/kubernetes-service/).

## Dependencies

Here is the list of technical dependencies for deploying the Marxan app using these infrastructure
resources. Note that these requirements are for this particular deployment strategy, and not dependencies
of the Marxan application itself - which can be deployed to other infrastructures.

Before proceeding, be sure you are familiar with all of these tools, as these instructions
will skip over the basics, and assume you are conformable using all of them.

- [Microsoft Azure](https://azure.microsoft.com)
- [Terraform](https://www.terraform.io/)
- [Docker](https://www.docker.com/)
- [Kubernetes](https://kubernetes.io/)
- [Helm](https://helm.sh/)
- [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)
- [Kubectl](https://kubernetes.io/docs/tasks/tools/)
- [Github Actions](https://github.com/features/actions)
- An SSH client capable of establishing [SSH Tunnels](https://www.ssh.com/academy/ssh/tunneling/example)
- DNS management
- A purchased domain

Of the above, the following need to be set up prior to following the instructions in this document:

- An Azure account with a user with enough permissions to create a
[Resource Group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/overview),
[Apps and Service Principals](https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals),
as well as multiple other resources in Azure.
- Azure CLI, Kubectl and an SSH client capable of establishing SSH Tunnels need to be installed locally.
- Access to managing Github Actions Secrets

#### A note on Azure quotas

Azure quotas for things like VMs are, by default, low, meaning you may see your cluster resources not be provisioned
(either during deploy time, or later on as part of an autoscaling policy). It's advisable to review and adjust said
quotas.

## Structure

This project has 3 main sections, each of which with a folder named after it. Each of these sections has a
Terraform project, that logically depends on their predecessors. There is a 4th component to this architecture,
which is handled by Github Actions

#### Remote state

Contains basic Azure resources, like a [Resource Group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/overview)
or a [Storage Account](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-overview)
to store the Terraform remote state.

#### Base

Contains multiple Azure resources needed for running Marxan on an
[AKS cluster](https://azure.microsoft.com/en-us/services/kubernetes-service/).

These resources include, but are not limited to:
- An [Azure Cache for Redis](https://azure.microsoft.com/en-us/services/cache/)
- Kubernetes node pools
- Multiple VNets, Subnets and networking security rules
- A Bastion host
- An [Azure DNS](https://azure.microsoft.com/en-us/services/dns/)
- An [Azure Container Registry](https://azure.microsoft.com/en-us/services/container-registry/) to store Docker images.

The output values include access data for some of the resources above.

#### Kubernetes

Contains the Kubernetes configuration to run Marxan on the resources created in the previous section, as well as some
new resources:

- Kubernetes deployments for the Marxan app components
- Kubernetes secrets, namespaces and ingresses
- HTTPS certificate manager
- PostgreSQL database (as a Helm chart)

*Notice:* when first applying this project, you may get an `Error: Invalid index` message from the ingress/load balancer
component. That is expected, and applying the same plan a 2nd time should succeed.

#### Github Actions

As part of this infrastructure, Github Actions are used to automatically build and push Docker images to Azure ACR, and
to redeploy Kubernetes pods once that happens. To be able to do so, you need to specify the following Github Actions
Secrets with the corresponding values:

- `AZURE_AKS_CLUSTER_NAME`: The name of the AKS cluster. Get from `Base`'s `k8s_cluster_name`
- `AZURE_AKS_HOST`: The AKS cluster hostname (without port or protocol). Get from `Base`'s `k8s_cluster_hostname`
- `AZURE_CLIENT_ID`: The hostname for the Azure ACT. Get from `Base`'s `container_registry_client_id`
- `AZURE_RESOURCE_GROUP`: The AKS Resource Group name. Specified by you when setting up the infrastructure.
- `AZURE_SUBSCRIPTION_ID`: The Azure Subscription Id. Get from `Base`'s `azure_subscription_id`
- `AZURE_TENANT_ID`: The Azure Tenant Id. Get from `Base`'s `azure_tenant_id`
- `BASTION_HOST`: The hostname for the bastion machine. Get from `Base`'s `bastion_hostname`
- `REGISTRY_LOGIN_SERVER`: The hostname for the Azure ACR. Get from `Base`'s `container_registry_hostname`
- `REGISTRY_USERNAME`: The username for the Azure ACR. Get from `Base`'s `container_registry_client_id`
- `REGISTRY_PASSWORD`: The password to access the Azure . Get from `Base`'s `container_registry_password`

## How to deploy

Deploying the included Terraform project is done in steps:
- Terraform `apply` the `Remote State` project.
- Terraform `apply` the `Base` project.
- Modify your DNS Registar configuration to use the just created Azure DNS zone as a name server.
- Configure your local `kubectl` (you can use [this](https://docs.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az-aks-get-credentials))
- Configure network access to the AKS cluster and have a tunnel to AKS up and running
(more on this [below](#network-access-to-azure-resources))
- Terraform `apply` the `Kubernetes` project.
- Create the PostgreSQL databases and users (more on this [here](#configuring-postgresql))


## Network access to Azure resources

For security reasons, most cloud resources are private, meaning they are attached to a private
[virtual network](https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-overview),
and thus inaccessible from the internet. If you need access to these resources (for example, to configure
Kubernetes, either directly or through Terraform), there is a [Bastion host](https://en.wikipedia.org/wiki/Bastion_host)
available. Through it, you can establish an [SSH Tunnel](https://www.ssh.com/academy/ssh/tunneling/example)
that will set up a local port, that is securely proxied through said bastion, and can reach the desired target host.

Here's an example of how to run said tunnel on linux:

`ssh -N -L <local port>:<target resource hostname>:<target resource port> <bastion user>@<bastion hostname>`

You can now access the target cloud resource on your local host on the port specified above.


### Network access to AKS

Since access to the Azure AKS cluster is done through HTTPS, we need not only an SSH tunnel, but also a way
to match the hostname, so that the certificate can be validated successfully. There are a number of ways to tackle
this, but one of them is as follows:

- Modify your hosts file (`/etc/hosts` on linux) to resolve the Kubernetes hostname to `127.0.0.1`
- Modify your `kubectl` [configuration file](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/)
to use a different port when reaching the AKS cluster (append `:<port number>` to the cluster hostname).
- Create an [SSH tunnel](#network-access-to-azure-resources) to that hostname, using the above specified port as
your local port.

You should now be able to use `kubectl` to access your AKS cluster.


## Configuring PostgreSQL

The included Terraform code sets up a PostgreSQL database server in the AKS cluster but, as it cannot reach it once
it's set up, it won't create the necessary databases and users for the Marxan application to run - this has to be done
manually:

- Use `kubectl` to open a terminal in the pod that is running the target PostgreSQL server.
- Create a database and a user with the corresponding credentials (see either the relevant
[Kubernetes Secret](https://kubernetes.io/docs/concepts/configuration/secret/) or
the [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/)).
- Make sure the user has full access to the associated database.
- Repeat, as needed, for each database used by the project.

You may need to manually restart application pods once the PostgreSQL users and databases are in place, and verify
that they connect successfully.
File renamed without changes.
1 change: 1 addition & 0 deletions infrastructure/main.tf → infrastructure/base/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ module "app_node_pool" {
resource_group = data.azurerm_resource_group.resource_group
project_name = var.project_name
subnet_id = module.network.aks_subnet_id
vm_size = "Standard_F4s_v2"
node_labels = {
type : "app"
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ resource "azurerm_linux_virtual_machine" "bastion" {
for_each = var.bastion_ssh_public_keys
content {
username = local.admin_user
public_key = admin_ssh_key.value.key
public_key = admin_ssh_key.value
}
}

Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@ output "azure_client_id" {
value = azuread_service_principal.github-actions-access.application_id
}

output "azuread_application_username" {
value = nonsensitive(azuread_application_password.github-actions-access.value)
}


output "azuread_application_password" {
value = nonsensitive(azuread_application_password.github-actions-access.value)
}
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,9 @@ resource "azurerm_kubernetes_cluster" "k8s_cluster" {
node_count = 1
vm_size = "Standard_D2_v2"
vnet_subnet_id = var.aks_subnet_id
enable_auto_scaling = var.enable_auto_scaling
min_count = var.min_node_count
max_count = var.max_node_count
}

identity {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,7 @@ output "cluster_id" {
output "cluster_name" {
value = azurerm_kubernetes_cluster.k8s_cluster.name
}

output "cluster_hostname" {
value = azurerm_kubernetes_cluster.k8s_cluster.kube_config[0].host
}
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ variable "resource_group" {
variable "kubernetes_version" {
type = string
description = "Version of kubernetes to deploy"
default = "1.22.4"
default = "1.22.6"
}

variable "gateway_subnet_id" {
Expand Down Expand Up @@ -40,3 +40,21 @@ variable "acr_id" {
description = "Id of the ACR so pull images from"
type = string
}

variable "min_node_count" {
type = number
default = 1
description = "The minimum number of machines in the default node pool"
}

variable "max_node_count" {
type = number
default = 4
description = "The maximum number of machines in the default node pool"
}

variable "enable_auto_scaling" {
type = bool
default = true
description = "If the default node pool will auto-scale"
}
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@ resource "azurerm_kubernetes_cluster_node_pool" "node_pool" {
name = var.name
kubernetes_cluster_id = var.aks_cluster_id
vm_size = var.vm_size
node_count = var.node_count
enable_auto_scaling = var.enable_auto_scaling
enable_auto_scaling = true
min_count = var.min_node_count
max_count = var.max_node_count
vnet_subnet_id = var.subnet_id
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,6 @@ variable "vm_size" {
description = "The node pool machine type"
}

variable "node_count" {
type = number
default = 1
description = "The number of machines in this pool"
}

variable "min_node_count" {
type = number
default = 1
Expand All @@ -41,12 +35,6 @@ variable "max_node_count" {
description = "The maximum number of machines in this pool"
}

variable "enable_auto_scaling" {
type = bool
default = true
description = "If this pool will auto-scale"
}

variable "node_labels" {
type = map(any)
default = {}
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
5 changes: 5 additions & 0 deletions infrastructure/outputs.tf → infrastructure/base/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ output "k8s_cluster_name" {
description = "AKS cluster name"
}

output "k8s_cluster_hostname" {
value = module.kubernetes.cluster_hostname
description = "AKS cluster hostname"
}

output "kube_config" {
value = module.kubernetes.kube_config
sensitive = true
Expand Down
File renamed without changes.
6 changes: 6 additions & 0 deletions infrastructure/base/vars/terraform.tfvars
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
project_name = "marxan"
location = "West Europe"
bastion_ssh_public_keys = [
"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCsQgoIZQAVAMFnESCsYotosbp3N2n8onp8Xmn0DZJmCnBzkfvn2SJdTQRKcyzjcHBqrseq+8Id0JYdb1aJJT2497b7NVOWvVLgqD5pYoxwLO4m3VjppUjpOfgGk3aBpzQTGwPHMqk4X4yvHNAuQcCTxo6gNIsyJZFxdzdc2P+oDLdTwekzsQvsPscFDXDYvtLTkCnSfeZAKsbb45XiAsH0HRnwzJYPvPr69V6c1R3igc2aDZ+eI2sZPvsCXWnvJYfL0QLJp+NwqJuRzHygcxsByg9p/wTPko2vEQLGvefBqjMFHbDYRyVh1omfwt3w/l5R6Abb1Mc2sNDqhBKFEe7/"
]
domain = "marxan.vizzuality.com"
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions kubernetes/main.tf → infrastructure/kubernetes/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ module "client_production" {
namespace = "production"
image = "marxan.azurecr.io/marxan-client:production"
deployment_name = "client"
site_url = "http://${module.ingress_production.client_ip}"
site_url = "http://${data.terraform_remote_state.core.outputs.dns_zone_name}"
}

module "webshot_production" {
Expand Down Expand Up @@ -236,7 +236,7 @@ module "client_staging" {
namespace = "staging"
image = "marxan.azurecr.io/marxan-client:staging"
deployment_name = "client"
site_url = "http://${module.ingress_production.client_ip}"
site_url = "http://staging.${data.terraform_remote_state.core.outputs.dns_zone_name}"
}

module "webshot_staging" {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,20 @@ resource "kubernetes_deployment" "api_deployment" {
}

spec {
affinity {
node_affinity {
required_during_scheduling_ignored_during_execution {
node_selector_term {
match_expressions {
key = "type"
values = ["app"]
operator = "In"
}
}
}
}
}

container {
image = var.image
image_pull_policy = "Always"
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,20 @@ resource "kubernetes_deployment" "geoprocessing_deployment" {
}

spec {
affinity {
node_affinity {
required_during_scheduling_ignored_during_execution {
node_selector_term {
match_expressions {
key = "type"
values = ["app"]
operator = "In"
}
}
}
}
}

container {
image = var.image
image_pull_policy = "Always"
Expand Down
File renamed without changes.
Empty file.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit c11c1f5

Please sign in to comment.