Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate Control Plane RG Setup #480

Merged
merged 23 commits into from
Sep 6, 2023
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Resource Group Setup

These scripts are helpers for setting up a new resource group (RG) in Azure as the *control plane* for MLOS.
The *control plane RG* is a container for the *persistent* resources of MLOS (results/metrics storage, scheduler VM, notebook interface, etc.).

## Quickstart

1. Starting in this current directory, ensure that we are logged in to Azure CLI.

```sh
az login
```

2. Make a copy of the control plane ARM parameters file.

```sh
cp rg-template.example.parameters.json rg-template.parameters.json
```

3. (Optional) Make a copy of the results DB parameters file, if planning to provision a results DB (suggested).

```sh
cp results-db/mysql-template.parameters.example.json results-db/mysql-template.parameters.json
```

4. Modify the ARM parameters in the newly created files as needed, especially the `PLACEHOLDER` values.

5. Execute the main script with CLI args as follows:

```shell
# With Powershell
./setup-rg.ps1 `
eujing marked this conversation as resolved.
Show resolved Hide resolved
-controlPlaneArmParamsFile $controlPlaneArmParamsFile `
-resultsDbArmParamsFile $resultsDbArmParamsFile # If provisioning results DB, otherwise omit `
-servicePrincipalName $servicePrincipalName `
-resourceGroupName $resourceGroupName `
-certName $certName
```

```sh
# With bash
# If provisioning results DB include '--resultsDbArmsParamsFile', otherwise omit
./setup-rg.sh \
--controlPlaneArmParamsFile $controlPlaneArmParamsFile \
--resultsDbArmParamsFile $resultsDbArmParamsFile \
--servicePrincipalName $servicePrincipalName \
--resourceGroupName $resourceGroupName \
--certName $certName
```

where `$*ArmParamsFile` can be the corresponding `*.parameters.json` and from before. However, it also follows the same usage as `--parameters` in [az deployment group create](https://learn.microsoft.com/en-us/cli/azure/deployment/group?view=azure-cli-latest#az-deployment-group-create-examples).

## Workflow

The high-level flow for what this script automates is as follows:

1. Assign `Contributor` access to the Service Principal (SP) for write access over resources.
Ideally, experiment resources are placed in their own RG.
When that isn't possible, they can also be placed in the control plane RG, in which case the SP can optionally be given access to the control plane RG as well.

2. Provision control plane resources into the RG.
This includes:
- Control VM for running the `mlos_bench` scheduler.
- Control VM's networking (public IP, security group, vnet, subnet, network interface)
- Key Vault for storing the SP credentials.
- Storage (storage account, file share)

3. The results DB is then optionally provisioned, adding appropriate firewall rules.

4. Assign `Key Vault Administrator` access to the current user.
This allows the current user to retrieve secrets / certificates from the VM once it is set up.
Ensure to log in as the same user in the VM.

5. Check if the desired certificate name already exists in the key vault.

6. If certificate does not exist yet, create or update the Service Principal (SP) with `Contributor` access for write over resources.
Ideally, experiment resources are placed in their own RG.
When that isn't possible, they can also be placed in the control plane RG, in which case the SP can optionally be given access to the control plane RG as well.

7. Otherwise, create or update the SP just with similar access as before. Now also verify that the existing certificate in the key vault matches one linked to the SP already, via thumbprint.
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"dbName": {
"type": "string"
},
"dbLocation": {
"type": "string",
"defaultValue": "[resourceGroup().location]"
},
"dbAdminUsername": {
"type": "string",
"defaultValue": "mlos"
},
"dbAdminPassword": {
"type": "securestring"
},
"dbFirewallRules": {
"type": "array",
"defaultValue": []
}
},
"functions": [],
"variables": {},
"resources": [
{
"type": "Microsoft.DBforMySQL/flexibleServers",
"apiVersion": "2021-12-01-preview",
"name": "[parameters('dbName')]",
"location": "[parameters('dbLocation')]",
"sku": {
"name": "Standard_B1s",
"tier": "Burstable"
},
"properties": {
"administratorLogin": "[parameters('dbAdminUsername')]",
"administratorLoginPassword": "[parameters('dbAdminPassword')]",
"storage": {
"storageSizeGB": 20,
"iops": 360,
"autoGrow": "Enabled"
},
"version": "8.0.21",
"network": {
"publicNetworkAccess": "Enabled"
},
"backup": {
"backupRetentionDays": 7,
"geoRedundantBackup": "Disabled"
},
"highAvailability": {
"mode": "Disabled"
}
}
},
{
"type": "Microsoft.DBforMySQL/flexibleServers/databases",
"apiVersion": "2021-12-01-preview",
"name": "[concat(parameters('dbName'), '/mlos')]",
"dependsOn": [
"[resourceId('Microsoft.DBforMySQL/flexibleServers', parameters('dbName'))]"
],
"properties": {
"charset": "utf8",
"collation": "utf8_general_ci"
}
},
{
"copy": {
"name": "firewallCopy",
"count": "[length(parameters('dbFirewallRules'))]"
},
"type": "Microsoft.DBforMySQL/flexibleServers/firewallRules",
"apiVersion": "2021-12-01-preview",
"name": "[concat(parameters('dbName'), '/', parameters('dbFirewallRules')[copyIndex('firewallCopy')].name)]",
"dependsOn": [
"[resourceId('Microsoft.DBforMySQL/flexibleServers', parameters('dbName'))]"
],
"properties": {
"startIpAddress": "[parameters('dbFirewallRules')[copyIndex('firewallCopy')].startIpAddress]",
"endIpAddress": "[parameters('dbFirewallRules')[copyIndex('firewallCopy')].endIpAddress]"
}
}
],
"outputs": {
"dbName": {
"type": "string",
"value": "[parameters('dbName')]"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"dbName": {
"value": "mlos-autotune-db"
},
"dbAdminUsername": {
"value": "mlos"
},
"dbAdminPassword": {
"value": "PLACEHOLDER"
},
"dbFirewallRules": {
"value": [
{
"name": "PLACEHOLDER",
"startIpAddress": "192.168.0.0",
"endIpAddress": "192.168.255.255"
}
]
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"projectPrefix": {
"value": "mlos-autotune"
},
"vmSKU": {
"value": "Standard_D2s_v3"
},
"vmAdminUsername": {
"value": "PLACEHOLDER"
},
"sshPublicKeys": {
"value": [
"PLACEHOLDER"
]
},
"vmSshSourceAddressPrefix": {
"value": "PLACEHOLDER; e.g. 123.123.0.0/16"
},
"fileShareName": {
"value": "mlos-file-share"
}
}
}
Loading
Loading