layout | page_title | sidebar_current | description |
---|---|---|---|
databricks |
Provider: Databricks |
docs-databricks-index |
Terraform provider databricks. |
Use the Databricks Terraform provider to interact with almost all of Databricks resources. If you're new to Databricks, please follow guide to create a workspace on Azure or AWS and then this workspace management tutorial. If you're migrating from version 0.2.x, please follow this guide. Changelog is available on GitHub.
Compute resources
- Deploy databricks_cluster on selected databricks_node_type
- Schedule automated databricks_job
- Constrol cost and data access with databricks_cluster_policy
- Speedup job & cluster startup with databricks_instance_pool
- Customize clusters with databricks_global_init_script
- Manage few databricks_notebook, and even list them
Storage
- Manage JAR, Wheel & Egg libraries through databricks_dbfs_file
- List entries on DBFS with databricks_dbfs_file_paths data source
- Get contents of small files with databricks_dbfs_file data source
- Mount your AWS storage using databricks_aws_s3_mount
- Mount your Azure storage using databricks_azure_adls_gen1_mount, databricks_azure_adls_gen2_mount, databricks_azure_blob_mount
Security
- Organize databricks_user into databricks_group through databricks_group_member, also reading metadata
- Manage data access with databricks_instance_profile, which can be assigned through databricks_group_instance_profile and databricks_user_instance_profile
- Control which networks can access workspace with databricks_ip_access_list
- Generically manage databricks_permissions
- Keep sensitive elements like passwords in databricks_secret, grouped into databricks_secret_scope and controlled by databricks_secret_acl
- Create workspaces in your VPC with DBFS using cross-account IAM roles, having your notebooks encrypted with CMK.
- Use predefined AWS IAM Policy Templates: databricks_aws_assume_role_policy, databricks_aws_crossaccount_policy, databricks_aws_bucket_policy
- Configure billing and audit databricks_mws_log_delivery
SQL Analytics
- Create databricks_sql_endpoint controlled by databricks_permissions.
provider "databricks" {
}
data "databricks_current_user" "me" {}
data "databricks_spark_version" "latest" {}
data "databricks_node_type" "smallest" {
local_disk = true
}
resource "databricks_notebook" "this" {
path = "${data.databricks_current_user.me.home}/Terraform"
language = "PYTHON"
content_base64 = base64encode(<<-EOT
# created from ${abspath(path.module)}
display(spark.range(10))
EOT
)
}
resource "databricks_job" "this" {
name = "Terraform Demo (${data.databricks_current_user.me.alphanumeric})"
new_cluster {
num_workers = 1
spark_version = data.databricks_spark_version.latest.id
node_type_id = data.databricks_node_type.smallest.id
}
notebook_task {
notebook_path = databricks_notebook.this.path
}
email_notifications {}
}
output "notebook_url" {
value = databricks_notebook.this.url
}
output "job_url" {
value = databricks_job.this.url
}
!> Warning Please be aware that hard coding any credentials in plain text is not something that is recommended. We strongly recommend using a Terraform backend that supports encryption. Please use environment variables, ~/.databrickscfg
file, encrypted .tfvars
files or secret store of your choice (Hashicorp Vault, AWS Secrets Manager, AWS Param Store, Azure Key Vault)
There are currently three supported methods to authenticate into the Databricks platform to create resources:
- PAT Tokens
- Username and password pair
- Azure Active Directory Tokens via Azure CLI or Service Principals
No configuration options given to your provider will look up configured credentials in ~/.databrickscfg
file. It is created by the databricks configure --token
command. Check this page
for more details. The provider uses config file credentials only when host
/token
or azure_auth
options are not specified.
It is the recommended way to use Databricks Terraform provider, in case you're already using the same approach with
AWS Shared Credentials File
or Azure CLI authentication.
provider "databricks" {
}
You can specify non-standard location of configuration file through config_file
parameter or DATABRICKS_CONFIG_FILE
environment variable:
provider "databricks" {
config_file = "/opt/databricks/cli-config"
}
You can specify a CLI connection profile through profile
parameter or DATABRICKS_CONFIG_PROFILE
environment variable:
provider "databricks" {
profile = "ML_WORKSPACE"
}
You can use host
and token
parameters to supply credentials to the workspace. When environment variables are preferred, then you can specify DATABRICKS_HOST
and DATABRICKS_TOKEN
instead. Environment variables are the second most recommended way of configuring this provider.
provider "databricks" {
host = "http://abc-cdef-ghi.cloud.databricks.com"
token = "dapitokenhere"
}
!> Warning This approach is currently recommended only for provisioning AWS workspaces and should be avoided for regular use.
You can use the username
+ password
attributes to authenticate provider for E2 workspace setup. Respective DATABRICKS_USERNAME
and DATABRICKS_PASSWORD
environment variables are applicable as well.
provider "databricks" {
host = "http://accounts.cloud.databricks.com"
username = var.user
password = var.password
}
The provider block supports the following arguments:
host
- (optional) This is the host of the Databricks workspace. It is a URL that you use to login to your workspace. Alternatively, you can provide this value as an environment variableDATABRICKS_HOST
.token
- (optional) This is the API token to authenticate into the workspace. Alternatively, you can provide this value as an environment variableDATABRICKS_TOKEN
.username
- (optional) This is the username of the user that can log into the workspace. Alternatively, you can provide this value as an environment variableDATABRICKS_USERNAME
. Recommended only for creating workspaces in AWS.password
- (optional) This is the user's password that can log into the workspace. Alternatively, you can provide this value as an environment variableDATABRICKS_PASSWORD
. Recommended only for creating workspaces in AWS.config_file
- (optional) Location of the Databricks CLI credentials file created bydatabricks configure --token
command (~/.databrickscfg by default). Check Databricks CLI documentation for more details. The provider uses configuration file credentials when you don't specify host/token/username/password/azure attributes. Alternatively, you can provide this value as an environment variableDATABRICKS_CONFIG_FILE
. This field defaults to~/.databrickscfg
.profile
- (optional) Connection profile specified within ~/.databrickscfg. Please check connection profiles section for more details. This field defaults toDEFAULT
.
To work with Azure Databricks workspace, the provider must know its azure_workspace_resource_id
(or construct it from azure_subscription_id
, azure_resource_group
and azure_workspace_name
). The provider works with Azure CLI authentication to facilitate local development workflows, though for automated scenarios a service principal auth is necessary (and specification of azure_client_id
, azure_client_secret
and azure_tenant_id
parameters).
!> Warning Please note that the azure service principal authentication currently uses a generated Databricks PAT token and not an AAD token for the authentication. Azure Databricks does not yet support AAD tokens for secret scopes. Databricks Labs team will refactor it transparently once that support is available. The only impacted field is pat_token_duration_seconds
, which will be deprecated and fully supported after AAD support.
provider "azurerm" {
client_id = var.client_id
client_secret = var.client_secret
tenant_id = var.tenant_id
subscription_id = var.subscription_id
}
resource "azurerm_databricks_workspace" "this" {
location = "centralus"
name = "my-workspace-name"
resource_group_name = var.resource_group
sku = "premium"
}
provider "databricks" {
azure_workspace_resource_id = azurerm_databricks_workspace.this.id
azure_client_id = var.client_id
azure_client_secret = var.client_secret
azure_tenant_id = var.tenant_id
}
resource "databricks_user" "my-user" {
user_name = "test-user@databricks.com"
}
It's possible to use experimental Azure CLI authentication, where the provider would rely on access token cached by az login
command so that local development scenarios are possible. Technically, the provider will call az account get-access-token
each time before an access token is about to expire. It is verified to work with all API. It could be turned off by setting azure_use_pat_for_cli
to true
on provider configuration.
provider "azurerm" {
features {}
}
resource "azurerm_databricks_workspace" "this" {
location = "centralus"
name = "my-workspace-name"
resource_group_name = var.resource_group
sku = "premium"
}
provider "databricks" {
azure_workspace_resource_id = azurerm_databricks_workspace.this.id
}
resource "databricks_user" "my-user" {
user_name = "test-user@databricks.com"
display_name = "Test User"
}
azure_workspace_resource_id
- (optional)id
attribute of azurerm_databricks_workspace resource. Combination of subscription id, resource group name, and workspace name.azure_workspace_name
- (optional) This is the name of your Azure Databricks Workspace. Alternatively, you can provide this value as an environment variableDATABRICKS_AZURE_WORKSPACE_NAME
. Not needed withazure_workspace_resource_id
is set.azure_resource_group
- (optional) This is the resource group in which your Azure Databricks Workspace resides. Alternatively, you can provide this value as an environment variableDATABRICKS_AZURE_RESOURCE_GROUP
. Not needed withazure_workspace_resource_id
is set.azure_subscription_id
- (optional) This is the Azure Subscription id in which your Azure Databricks Workspace resides. Alternatively you can provide this value as an environment variableDATABRICKS_AZURE_SUBSCRIPTION_ID
orARM_SUBSCRIPTION_ID
. Not needed withazure_workspace_resource_id
is set.azure_client_secret
- (optional) This is the Azure Enterprise Application (Service principal) client secret. This service principal requires contributor access to your Azure Databricks deployment. Alternatively, you can provide this value as an environment variableDATABRICKS_AZURE_CLIENT_SECRET
orARM_CLIENT_SECRET
.azure_client_id
- (optional) This is the Azure Enterprise Application (Service principal) client id. This service principal requires contributor access to your Azure Databricks deployment. Alternatively, you can provide this value as an environment variableDATABRICKS_AZURE_CLIENT_ID
orARM_CLIENT_ID
.azure_tenant_id
- (optional) This is the Azure Active Directory Tenant id in which the Enterprise Application (Service Principal) resides. Alternatively, you can provide this value as an environment variableDATABRICKS_AZURE_TENANT_ID
orARM_TENANT_ID
.azure_environment
- (optional) This is the Azure Environment which defaults to thepublic
cloud. Other options aregerman
,china
andusgovernment
. Alternatively, you can provide this value as an environment variableARM_ENVIRONMENT
.pat_token_duration_seconds
- The current implementation of the azure auth via sp requires the provider to create a temporary personal access token within Databricks. The current AAD implementation does not cover all the APIs for Authentication. This field determines the duration in which that temporary PAT token is alive. It is measured in seconds and will default to3600
seconds.debug_truncate_bytes
- Applicable only whenTF_LOG=DEBUG
is set. Truncate JSON fields in HTTP requests and responses above this limit. Default is 96.debug_headers
- Applicable only whenTF_LOG=DEBUG
is set. Debug HTTP headers of requests made by the provider. Default is false. We recommend to turn this flag on only under exceptional circumstances, when troubleshooting authentication issues. Turning this flag on will log firstdebug_truncate_bytes
of any HTTP header value in cleartext.
There are multiple environment variable options, the DATABRICKS_AZURE_*
environment variables take precedence, and the ARM_*
environment variables provide a way to share authentication configuration using the databricks
provider alongside the azurerm
provider.
The following configuration attributes can be passed via environment variables:
Argument | Environment variable |
---|---|
host |
DATABRICKS_HOST |
token |
DATABRICKS_TOKEN |
username |
DATABRICKS_USERNAME |
password |
DATABRICKS_PASSWORD |
config_file |
DATABRICKS_CONFIG_FILE |
profile |
DATABRICKS_CONFIG_PROFILE |
azure_workspace_resource_id |
DATABRICKS_AZURE_WORKSPACE_RESOURCE_ID |
azure_workspace_name |
DATABRICKS_AZURE_WORKSPACE_NAME |
azure_resource_group |
DATABRICKS_AZURE_RESOURCE_GROUP |
azure_subscription_id |
DATABRICKS_AZURE_SUBSCRIPTION_ID or ARM_SUBSCRIPTION_ID |
azure_client_secret |
DATABRICKS_AZURE_CLIENT_SECRET or ARM_CLIENT_SECRET |
azure_client_id |
DATABRICKS_AZURE_CLIENT_ID or ARM_CLIENT_ID |
azure_tenant_id |
DATABRICKS_AZURE_TENANT_ID or ARM_TENANT_ID |
azure_environment |
ARM_ENVIRONMENT |
debug_truncate_bytes |
DATABRICKS_DEBUG_TRUNCATE_BYTES |
debug_headers |
DATABRICKS_DEBUG_HEADERS |
For example, with the following zero-argument configuration:
provider "databricks" {}
- Provider will check all of the supported environment variables and set values of relevant arguments.
- In case any conflicting arguments are present, the plan will end with an error.
- Will check for the presence of
host
+token
pair, continue trying otherwise. - Will check for
host
+username
+password
presence, continue trying otherwise. - Will check for Azure workspace ID,
azure_client_secret
+azure_client_id
+azure_tenant_id
presence, continue trying otherwise. - Will check for Azure workspace ID presence, and if
AZ CLI
returns an access token, continue trying otherwise. - Will check for the
~/.databrickscfg
file in the home directory, will fail otherwise. - Will check for
profile
presence and try picking from that file will fail otherwise. - Will check for
host
andtoken
orusername
+password
combination, will fail if nothing of these exist.
In Terraform 0.13 and later, data resources have the same dependency resolution behavior as defined for managed resources. Most data resources make an API call to a workspace. If a workspace doesn't exist yet, Authentication is not configured for provider
error is raised. To work around this issue and guarantee a proper lazy authentication with data resources, you should add depends_on = [azurerm_databricks_workspace.this]
or depends_on = [databricks_mws_workspaces.this]
to the body. This issue doesn't occur if workspace is created in one module and resources within the workspace are created in another. We do not recommend using Terraform 0.12 and earlier, if your usage involves data resources.
Important: Projects in the databrickslabs
GitHub account, including the Databricks Terraform Provider, are not formally supported by Databricks. They are maintained by Databricks Field teams and provided as-is. There is no service level agreement (SLA). Databricks makes no guarantees of any kind. If you discover an issue with the provider, please file a GitHub Issue on the repo, and it will be reviewed by project maintainers as time permits.