This repository contains the reference architecture of the infrastructure needed to deploy dSPACE SIMPHERA to the Azure Public Cloud. It does not contain the helm chart needed to deploy SIMPHERA itself, but only the base infrastructure such as Kubernetes, PostgreSQL, storage accounts, etc.
You can use the reference architecture as a starting point for your SIMPHERA installation if you plan to deploy SIMPHERA to Azure. You can use the reference architecture as is and only have to configure few individual values. If you have special requirements feel free to adapt the architecture to your needs. For example, the reference architecture does not contain any kind of VPN connection to a private, on-premise network because this is highly specific. But the reference architecture is configured in such a way that the ingress points are available in the public internet.
Using the reference architecture you can deploy a single or even multiple instances of SIMPHERA, e.g. one for production and one for testing.
This reference architecture is provided as a Terraform configuration. Terraform is an open-source command line tool to automatically create and manage cloud resources. A Terraform configuration consists of various .tf
text files. These files contain the specifications of the resources to be created in the cloud infrastructure. That is the reason why this approach is called infrastructure-as-code. The main advantage of this approach is reproducibility becaue the configuration can be mainted in a source control system such as Git.
Terraform uses variables to make the specification configurable. The concrete values for these variables are specified in .tfvars
files. So it is the task of the administrator to fill the .tfvars
files with the correct values. This is explained in more detail in a later chapter.
Terraform has the concept of a state. On the one hand side there are the resource specifications in the .tf
files. On the other hand there are the resources in the cloud infrastructure that are created based on these files. Terraform needs to store mapping information which element of the specification belongs to which resource in the cloud infrastructure. This mapping is called the state. In general you could store the state on your local hard drive. But that is not a good idea because in that case nobody else could change some settings and apply these changes. Therefore the state itself should be stored in the cloud.
So you need to manually create a storage account in Azure before you can start using Terraform. This is explained in more detail in the section Prerequisites.
As mentioned before, the reference architecture is defined as a Terraform configuration. It has been tested with Terraform version v1.0.0.
The following figure shows the main resources of the architecture. The figure shows using Azure Database for PostgreSQL and Private Link. This configuration is recommended by dSPACE. The general purpose tier for the Postgresql server is required to use the private link. The reference architecture also supports the basic tier. In this case, instead of the private link, firewall rules are used that only allow access from within the Kubernetes cluster.
Before you start you need an Azure subscription and the contributor
role to create the resources needed for SIMPHERA. Additionally, you need to create the following resources that are not part of this Terraform configuration:
- Storage Account: A storage account with Performance set to
standard
and account kind set toStorageV2 (general purpose v2)
is needed to store the Terraform state. You also have to create a container for the state inside the storage account. - KeyVault: The credentials of the PostgreSQL servers and the keys to encrypt the disks of the virtual machine for the license server must be stored in an Azure KeyVault. The KeyVault is not managed by Terraform and has to be created manually (see Azure KeyVault section).
- Log Analytics Workspace (optional): In order to store the log data of the services you have to provide such a workspace inside your subscription.
On your administration PC you need to install the Terraform command, the Azure CLI and ssh-keygen
which is typically available on most operating systems.
To login to Azure, use:
az login
To switch to the correct subscription you can use the following command:
az account set --subscription "My Subscription"
If you did not already clone this Git repository please clone it now to your local administration PC.
In order to be able to connect to the Kubernetes nodes using ssh you need to create private ssh keys. You have to create such keys by executing the following command in the root folder:
# bash
ssh-keygen -t rsa -b 2048 -f shared-ssh-key/ssh -q -N ""
# Powershell
ssh-keygen -t rsa -b 2048 -f shared-ssh-key/ssh -q -N """"
To get a list of all postgresql passwords run the following command:
$secretnames = terraform output -json secretnames | ConvertFrom-Json
$keyvaultname = terraform output -json key_vault_name
$postgresql_passwords = @{}
foreach($prop in $secretnames.PsObject.Properties)
{
$secret = az keyvault secret show --name $prop.value --vault-name $keyvaultname | ConvertFrom-Json
$value = $secret.value | ConvertFrom-Json
$postgresql_passwords[$prop.name] = ConvertTo-SecureString $value.postgresql_password -AsPlainText -Force
Write-Host "The value of $($prop.value) secret for $($prop.name) instance is $value"
Remove-Variable secret
Remove-Variable value
}
To get list of all storage account keys run the following command:
$access_keys = @{}
$storageaccounts = terraform output -json minio_storage_usernames | ConvertFrom-Json
foreach($prop in $storageaccounts.PsObject.Properties)
{
$keys = az storage account keys list -n $prop.value | ConvertFrom-Json
$access_keys[$prop.name] = ConvertTo-SecureString $keys[0].value -AsPlainText -Force
Write-Host "The value of $($prop.value) key for $($prop.name) instance is $(ConvertFrom-SecureString $access_keys[$prop.name] -AsPlainText)"
Remove-Variable keys
}
As mentioned before in order to store the log data of the services you have to provide such a workspace in your subscription.
To create Log analytics workspace, use:
az monitor log-analytics workspace create --workspace-name "<LogAnalyticsWorkspaceName>" --resource-group "<LogAnalyticsWorkspaceResourceGroup>" --location "<Location>"
- LogAnalyticsWorkspaceName - Name of the Log Analytics Workspace
- LogAnalyticsWorkspaceResourceGroup - Name of the Log Analytics Workspace resource group
- Location - Location of the Log Analytics Workspace, eg. westeurope
As mentioned before Terraform stores the state of the resources it creates within a container of an Azure storage account. Therefore, you need to specify this location.
To do so, please make a copy of the file state-backend-template
, name it state-backend.tf
and open the file in a text editor. The values have to point to an existing storage account to be used to store the Terraform state:
resource_group_name
: The name of the resource group your storage account is located in.storage_account_name
: The name of the storage account.container_name
: The name of the container inside the storage account to be used to store the terraform state. You need to create this container manually.key
: The name of the file to be used inside the container to be used for this terraform state.environment
: Use the valuepublic
for the general Azure cloud.
For your configuration, please make a copy of the file terraform.tfvars.example
, name it terraform.tfvars
and open the file in a text editor. This file contains all variables that are configurable including documentation of the variables. Please adapt the values before you deploy the resources.
List with description of all mandatory and optional variables could be find in the Inputs part of this readme file.
It is recommended to restrict the access to the Kubernetes API server using authorized IP address ranges by setting the variable apiServerAuthorizedIpRanges
.
It is recommended to restrict the access to the Key Vault using authorized IP address ranges by setting the variable keyVaultAuthorizedIpRanges
.
If you use AURELION with SIMPHERA then the AURELION Pods are executed in the GPU node pool. AURELION uses a specific OptiX Version and thus needs specific NVIDIA Drivers. NVIDIA provides the gpu-operator, a tool with which it is possible to use containerized drivers inside pods. This makes it possible to use the needed driver Versions independent of the default installation of the NVIDIA Drivers on the GPU node pool, which can only be not installed, selecting a version is not possible. Further infomations
Typically, you have autoscaling enabled for the GPU node pool so that VMs are scaled down if they are no longer needed. However, the AURELION container image is big and it takes time to download the image to the Kubernetes node. Depending on your location this can take more than 30 minutes. To shorten these times the Scale Down Mode of the GPU node pool should be set to Deallocate. That means, that a GPU VM is not deleted but only deallocated. So you no longer have to pay for the compute resources but only for the disk that will not be deleted when using this mode.
You can enable and disable this mode using the variables linuxExecutionNodeDeallocate
and gpuNodeDeallocate
. That means, you can not only configure this for the GPU node pool but also for the Execution node pool. As a default Deallocate is used for both node pools.
Before you can deploy the resources to Azure you have to initialize Terraform:
terraform init
Afterwards you can deploy the resources:
terraform apply
Terraform automatically loads the variables from your terraform.tfvars
variable definition file.
For each configured SIMPHERA instance an individual Azure storage account is created to store binary artifacts. The name of the storage account is a concatenation of the infrastructurename and the instancename, where hyphens are removed and which is clipped to a maximum of 24 characters. Please open the Azure Portal and navigate to the storage account which is located inside the resource group <instancename>-storage
. Later during the configuration of the SIMPHERA Helm Chart you need the name of this storage account and also an Access Key that is also accessible from the portal.
This deployment contains a managed Kubernetes cluster (AKS). In order to use command line tools such as kubectl
or helm
you need a kubeconfig configuration file. This file will automatically be exported by Terraform under the filename <infrastructurename>.kubeconfig
.
If you want to ssh into a Kubernetes worker node you can use a command like this:
ssh -i shared-ssh-key/ssh simphera@<name-or-ip-of-node>
But please keep in mind that the nodes themselves do not get public IPs. Therefore you may need to create a Linux jumpbox VM within your virtual network to be able to connect to a node from there. In that case you have to copy the private key to that machine and have to set the correct file access: chmod 600 shared-ssh-key/ssh
. As an alternative you can use the License Server Windows VM as jumpbox.
This reference architecture deploys Azure Policy into the Kubernetes cluster. With Azure Policy, security policies can be defined and violations monitored. Azure provides various predefined policies. By default, no policies are assigned to the Kubernetes cluster using the reference architecture. Instead, an administrator must assign policies manually which requires appropriate permissions. The Azure built-in roles Resource Policy Contributor and Owner have these permissions. Using the predefined policy Kubernetes cluster containers should only use allowed images is recommended by dSPACE. To do this, use the CLI command below:
$clustername = "<cluster name>"
$resourcegroup = "<cluster resource group>"
$cluster = az aks show --name $clustername --resource-group $resourcegroup | ConvertFrom-Json
$name = "K8sAzureContainerAllowedImages@${clustername}"
$description = "Kubernetes cluster containers should only use allowed images"
$scope = $cluster.id
$policy = "febd0533-8e55-448f-b837-bd0e06f16469"
$allowedContainerImagesRegex = "^(docker\.io\/(groundnuty|jboss|eclipse-mosquitto|bitnami)|quay\.io\/oauth2-proxy|registry\.dspace\.cloud|registry\.k8s\.io)\/.+$"
$params_ = @"
{
"allowedContainerImagesRegex": {
"value": "$allowedContainerImagesRegex"
}
}
"@
$params_ = $params_ -replace '\s',''
$params = $params_ -replace '([\\]*)"', '$1$1\"'
az policy assignment create `
--scope $scope `
--description $description `
--name $name `
--policy $policy `
--params $params
To delete all resources you have to execute the following command:
terraform destroy
Please keep in mind that this command will also delete all storage accounts including your backups. So please be careful.
As a next step you have to deploy SIMPHERA to the Kubernetes cluster by using the SIMPHERA Quick Start helm chart. You will find detailed instructions in the README file inside the Helm chart itself.
Tool name | Version |
---|---|
Azure CLI | >=2.40.0 |
Helm | >=3.8.0 |
Terraform | >=1.2.9 |
kubectl | >=1.27.0 |
Name | Version |
---|---|
terraform | >= 1.0.0 |
Name | Version |
---|---|
azurerm | n/a |
local | n/a |
random | n/a |
Name | Source | Version |
---|---|---|
simphera_instance | ./modules/simphera_instance | n/a |
Name | Description | Type | Default | Required |
---|---|---|---|---|
apiServerAuthorizedIpRanges | List of authorized IP address ranges that are granted access to the Kubernetes API server, e.g. ["198.51.100.0/24"] | set(string) |
null |
no |
gpuNodeCountMax | The maximum number of nodes for gpu job execution | number |
12 |
no |
gpuNodeCountMin | The minimum number of nodes for gpu job execution | number |
0 |
no |
gpuNodeDeallocate | Configures whether the nodes for the gpu job execution are 'Deallocated (Stopped)' by the cluster auto scaler or 'Deleted'. | bool |
true |
no |
gpuNodePool | Specifies whether an additional node pool for gpu job execution is added to the kubernetes cluster | bool |
false |
no |
gpuNodeSize | The machine size of the nodes for the gpu job execution | string |
"Standard_NC16as_T4_v3" |
no |
infrastructurename | The name of the infrastructure. e.g. simphera-infra | string |
n/a | yes |
keyVaultAuthorizedIpRanges | List of authorized IP address ranges that are granted access to the Key Vault, e.g. ["198.51.100.0/24"] | set(string) |
[] |
no |
keyVaultPurgeProtection | Specifies whether the Key vault purge protection is enabled. | bool |
true |
no |
kubernetesVersion | The version of the AKS cluster. | string |
"1.28.3" |
no |
licenseServer | Specifies whether a VM for the dSPACE Installation Manager will be deployed. | bool |
false |
no |
licenseServerIaaSAntimalware | Specifies whether a IaaSAntimalware extension will be installed on license server VM. Depends on licenseServer variable. | bool |
true |
no |
licenseServerMicrosoftGuestConfiguration | Specifies whether a Microsoft Guest configuration extension will be installed on license server VM. Depends on licenseServer variable. | bool |
true |
no |
licenseServerMicrosoftMonitoringAgent | Specifies whether a MicrosoftMonitoringAgent extension will be installed on license server VM. Depends on licenseServer, logAnalyticsWorkspaceName and logAnalyticsWorkspaceResourceGroupName variables. | bool |
true |
no |
linuxExecutionNodeCountMax | The maximum number of Linux nodes for the job execution | number |
10 |
no |
linuxExecutionNodeCountMin | The minimum number of Linux nodes for the job execution | number |
0 |
no |
linuxExecutionNodeDeallocate | Configures whether the Linux nodes for the job execution are 'Deallocated (Stopped)' by the cluster auto scaler or 'Deleted'. | bool |
true |
no |
linuxExecutionNodeSize | The machine size of the Linux nodes for the job execution | string |
"Standard_D16s_v4" |
no |
linuxNodeCountMax | The maximum number of Linux nodes for the regular services | number |
12 |
no |
linuxNodeCountMin | The minimum number of Linux nodes for the regular services | number |
1 |
no |
linuxNodeSize | The machine size of the Linux nodes for the regular services | string |
"Standard_D4s_v4" |
no |
location | The Azure location to be used. | string |
n/a | yes |
logAnalyticsWorkspaceName | The name of the Log Analytics Workspace to be used. Use empty string to disable usage of Log Analytics. | string |
"" |
no |
logAnalyticsWorkspaceResourceGroupName | The name of the resource group of the Log Analytics Workspace to be used. | string |
"" |
no |
simpheraInstances | A list containing the individual SIMPHERA instances, such as 'staging' and 'production'. | map(object({ |
n/a | yes |
ssh_public_key_path | Path to the public SSH key to be used for the kubernetes nodes. | string |
"shared-ssh-key/ssh.pub" |
no |
tags | The tags to be added to all resources. | map(any) |
{} |
no |
Name | Description |
---|---|
key_vault_id | n/a |
key_vault_name | n/a |
key_vault_uri | n/a |
kube_config | n/a |
minio_storage_usernames | n/a |
postgresql_server_hostnames | n/a |
postgresql_server_usernames | n/a |
secretnames | n/a |