Skip to content

Latest commit

 

History

History
120 lines (88 loc) · 11.6 KB

File metadata and controls

120 lines (88 loc) · 11.6 KB

Provisioning Azure Databricks workspace with a Hub & Spoke firewall for data exfiltration protection

This example is using the adb-exfiltration-protection module.

This template provides an example deployment of: Hub-Spoke networking with egress firewall to control all outbound traffic from Databricks subnets. Details are described in: https://databricks.com/blog/2020/03/27/data-exfiltration-protection-with-azure-databricks.html

With this setup, you can setup firewall rules to block / allow egress traffic from your Databricks clusters. You can also use firewall to block all access to storage accounts, and use private endpoint connection to bypass this firewall, such that you allow access only to specific storage accounts.

To find IP and FQDN for your deployment, go to: https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr

Overall Architecture

alt text

Resources to be created:

  • Resource group with random prefix
  • Tags, including Owner, which is taken from az account show --query user
  • Hub-Spoke topology, with hub firewall in hub vnet's subnet.
  • Associated firewall rules, both FQDN and network rule using IP.

How to use

  1. Update terraform.tfvars file and provide values to each defined variable.
  2. (Optional) Configure your remote backend
  3. Run terraform init to initialize terraform and get provider ready.
  4. Run terraform apply to create the resources.

How to fill in variable values

Some variables have no default value and will require one, e.g. subscription_id

Most of the values are to be found at: https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr

In variables.tfvars, set these variables:

metastoreip = "40.78.233.2" # find your metastore service ip

sccip = "52.230.27.216" # use nslookup on the domain name to find the ip

webappip = "52.187.145.107/32" # given at UDR page

firewallfqdn = ["dbartifactsprodseap.blob.core.windows.net","dbartifactsprodeap.blob.core.windows.net","dblogprodseasia.blob.core.windows.net","prod-southeastasia-observabilityeventhubs.servicebus.windows.net","cdnjs.com"] # find these for your region, follow Databricks blog tutorial.

Requirements

Name Version
azurerm >=4.0.0
databricks >=1.52.0

Providers

Name Version
azurerm 4.9.0
external 1.58.0
random 3.6.3
dns 3.4.2

Modules

No modules.

Resources

Name Type
azurerm_databricks_workspace.this resource
azurerm_firewall.hubfw resource
azurerm_firewall_application_rule_collection.adbfqdn resource
azurerm_firewall_network_rule_collection.adbfnetwork resource
azurerm_network_security_group.this resource
azurerm_public_ip.fwpublicip resource
azurerm_resource_group.this resource
azurerm_route_table.adbroute resource
azurerm_storage_account.allowedstorage resource
azurerm_storage_account.deniedstorage resource
azurerm_subnet.hubfw resource
azurerm_subnet.private resource
azurerm_subnet.public resource
azurerm_subnet_network_security_group_association.private resource
azurerm_subnet_network_security_group_association.public resource
azurerm_subnet_route_table_association.privateudr resource
azurerm_subnet_route_table_association.publicudr resource
azurerm_virtual_network.hubvnet resource
azurerm_virtual_network.this resource
azurerm_virtual_network_peering.hubvnet resource
azurerm_virtual_network_peering.spokevnet resource
random_string.naming resource
azurerm_client_config.current data source
external_external.me data source

Inputs

Name Description Type Default Required
subscription_id n/a string n/a yes
dbfs_prefix n/a string "dbfs" no
firewallfqdn n/a list(any) n/a yes
hubcidr n/a string "10.178.0.0/20" no
metastoreip n/a string n/a yes
private_subnet_endpoints n/a list [] no
rglocation n/a string "southeastasia" no
sccip n/a string n/a yes
spokecidr n/a string "10.179.0.0/20" no
webappip n/a string n/a yes
workspace_prefix n/a string "adb" no

Outputs

Name Description
azure_resource_group_id n/a
workspace_id n/a
workspace_url n/a