Provisioning Azure Databricks workspace with a Hub & Spoke firewall for data exfiltration protection
This example is using the adb-exfiltration-protection module.
This template provides an example deployment of: Hub-Spoke networking with egress firewall to control all outbound traffic from Databricks subnets. Details are described in: https://databricks.com/blog/2020/03/27/data-exfiltration-protection-with-azure-databricks.html
With this setup, you can setup firewall rules to block / allow egress traffic from your Databricks clusters. You can also use firewall to block all access to storage accounts, and use private endpoint connection to bypass this firewall, such that you allow access only to specific storage accounts.
To find IP and FQDN for your deployment, go to: https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr
Resources to be created:
- Resource group with random prefix
- Tags, including
Owner
, which is taken fromaz account show --query user
- Hub-Spoke topology, with hub firewall in hub vnet's subnet.
- Associated firewall rules, both FQDN and network rule using IP.
- Update
terraform.tfvars
file and provide values to each defined variable. - (Optional) Configure your remote backend
- Run
terraform init
to initialize terraform and get provider ready. - Run
terraform apply
to create the resources.
Some variables have no default value and will require one, e.g. subscription_id
Most of the values are to be found at: https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr
In variables.tfvars
, set these variables:
metastoreip = "40.78.233.2" # find your metastore service ip
sccip = "52.230.27.216" # use nslookup on the domain name to find the ip
webappip = "52.187.145.107/32" # given at UDR page
firewallfqdn = ["dbartifactsprodseap.blob.core.windows.net","dbartifactsprodeap.blob.core.windows.net","dblogprodseasia.blob.core.windows.net","prod-southeastasia-observabilityeventhubs.servicebus.windows.net","cdnjs.com"] # find these for your region, follow Databricks blog tutorial.
Name | Version |
---|---|
azurerm | >=4.0.0 |
databricks | >=1.52.0 |
Name | Version |
---|---|
azurerm | 4.9.0 |
external | 1.58.0 |
random | 3.6.3 |
dns | 3.4.2 |
No modules.
Name | Description | Type | Default | Required |
---|---|---|---|---|
subscription_id | n/a | string |
n/a | yes |
dbfs_prefix | n/a | string |
"dbfs" |
no |
firewallfqdn | n/a | list(any) |
n/a | yes |
hubcidr | n/a | string |
"10.178.0.0/20" |
no |
metastoreip | n/a | string |
n/a | yes |
private_subnet_endpoints | n/a | list |
[] |
no |
rglocation | n/a | string |
"southeastasia" |
no |
sccip | n/a | string |
n/a | yes |
spokecidr | n/a | string |
"10.179.0.0/20" |
no |
webappip | n/a | string |
n/a | yes |
workspace_prefix | n/a | string |
"adb" |
no |
Name | Description |
---|---|
azure_resource_group_id | n/a |
workspace_id | n/a |
workspace_url | n/a |