Skip to content

Commit

Permalink
Various doc updates (#1782)
Browse files Browse the repository at this point in the history
* various doc updates
* fix uris
  • Loading branch information
nkvuong authored Nov 24, 2022
1 parent 8280502 commit 67a8760
Show file tree
Hide file tree
Showing 4 changed files with 24 additions and 17 deletions.
4 changes: 2 additions & 2 deletions docs/resources/cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ resource "databricks_cluster" "shared_autoscaling" {
* `idempotency_token` - (Optional) An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
* `ssh_public_keys` - (Optional) SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
* `spark_env_vars` - (Optional) Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
* `custom_tags` - (Optional) Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to `default_tags`.
* `custom_tags` - (Optional) Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to `default_tags`. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an `x_` when it is propagated.
* `spark_conf` - (Optional) Map with key-value pairs to fine-tune Spark clusters, where you can provide custom [Spark configuration properties](https://spark.apache.org/docs/latest/configuration.html) in a cluster configuration.
* `is_pinned` - (Optional) boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is [limited to 70](https://docs.databricks.com/clusters/clusters-manage.html#pin-a-cluster), so `apply` may fail if you have more than that.

Expand Down Expand Up @@ -442,7 +442,7 @@ resource "databricks_cluster" "this" {
In addition to all arguments above, the following attributes are exported:

* `id` - Canonical unique identifier for the cluster.
* `default_tags` - (map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: <Databricks internal use>
* `default_tags` - (map) Tags that are added by Databricks by default, regardless of any `custom_tags` that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: <Databricks internal use>, and any workspace and pool tags.
* `state` - (string) State of the cluster.

## Access Control
Expand Down
2 changes: 1 addition & 1 deletion docs/resources/job.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ You can invoke Spark submit tasks only on new clusters. **In the `new_cluster` s

### spark_python_task Configuration Block

* `python_file` - (Required) The URI of the Python file to be executed. [databricks_dbfs_file](dbfs_file.md#path) and S3 paths are supported. This field is required.
* `python_file` - (Required) The URI of the Python file to be executed. [databricks_dbfs_file](dbfs_file.md#path), cloud file URIs (e.g. `s3:/`, `abfss:/`, `gs:/`) and workspace paths are supported. For python files stored in the Databricks workspace, the path must be absolute and begin with `/Repos`. This field is required.
* `parameters` - (Optional) (List) Command line parameters passed to the Python file.

### notebook_task Configuration Block
Expand Down
25 changes: 15 additions & 10 deletions docs/resources/mount.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,21 @@ subcategory: "Storage"
---
# databricks_mount Resource

This resource will [mount your cloud storage](https://docs.databricks.com/data/databricks-file-system.html#mount-object-storage-to-dbfs) on `dbfs:/mnt/name`. Right now it supports mounting AWS S3, Azure (Blob Storage, ADLS Gen1 & Gen2), Google Cloud Storage. It is important to understand that this will start up the [cluster](cluster.md) if the cluster is terminated. The read and refresh terraform command will require a cluster and may take some time to validate the mount. If `cluster_id` is not specified, it will create the smallest possible cluster with name equal to or starting with `terraform-mount` for the shortest possible amount of time.
This resource will [mount your cloud storage](https://docs.databricks.com/data/databricks-file-system.html#mount-object-storage-to-dbfs) on `dbfs:/mnt/name`. Right now it supports mounting AWS S3, Azure (Blob Storage, ADLS Gen1 & Gen2), Google Cloud Storage. It is important to understand that this will start up the [cluster](cluster.md) if the cluster is terminated. The read and refresh terraform command will require a cluster and may take some time to validate the mount.

**Note** When `cluster_id` is not specified, it will create the smallest possible cluster in the default availability zone with name equal to or starting with `terraform-mount` for the shortest possible amount of time. To avoid mount failure due to potentially quota or capacity issues with the default cluster, we recommend specifying a cluster to use for mounting.

This resource provides two ways of mounting a storage account:

1. Use a storage-specific configuration block - this could be used for the most cases, as it will fill most of the necessary details. Currently we support following configuration blocks:
* `s3` - to [mount AWS S3](https://docs.databricks.com/data/data-sources/aws/amazon-s3.html)
* `gs` - to [mount Google Cloud Storage](https://docs.gcp.databricks.com/data/data-sources/google/gcs.html)
* `abfs` - to [mount ADLS Gen2](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/adls-gen2/) using Azure Blob Filesystem (ABFS) driver
* `adl` - to [mount ADLS Gen1](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake) using Azure Data Lake (ADL) driver
* `wasb` - to [mount Azure Blob Storage](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage) using Windows Azure Storage Blob (WASB) driver
1. Use generic arguments - you have a responsibility for providing all necessary parameters that are required to mount specific storage. This is most flexible option

* `s3` - to [mount AWS S3](https://docs.databricks.com/data/data-sources/aws/amazon-s3.html)
* `gs` - to [mount Google Cloud Storage](https://docs.gcp.databricks.com/data/data-sources/google/gcs.html)
* `abfs` - to [mount ADLS Gen2](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/adls-gen2/) using Azure Blob Filesystem (ABFS) driver
* `adl` - to [mount ADLS Gen1](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake) using Azure Data Lake (ADL) driver
* `wasb` - to [mount Azure Blob Storage](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage) using Windows Azure Storage Blob (WASB) driver

1. Use generic arguments - you have a responsibility for providing all necessary parameters that are required to mount specific storage. This is most flexible option

## Common arguments

Expand Down Expand Up @@ -157,7 +161,7 @@ resource "databricks_mount" "this" {

This block allows specifying parameters for mounting of the ADLS Gen2. The following arguments are required inside the `abfs` block:

* `client_id` - (Required) (String) This is the client_id (Application Object ID) for the enterprise application for the service principal.
* `client_id` - (Required) (String) This is the client_id (Application Object ID) for the enterprise application for the service principal.
* `tenant_id` - (Optional) (String) This is your azure directory tenant id. It is required for creating the mount. (Could be omitted if Azure authentication is used, and we can extract `tenant_id` from it).
* `client_secret_key` - (Required) (String) This is the secret key in which your service principal/enterprise app client secret will be stored.
* `client_secret_scope` - (Required) (String) This is the secret scope in which your service principal/enterprise app client secret will be stored.
Expand Down Expand Up @@ -220,7 +224,7 @@ resource "databricks_mount" "marketing" {

This block allows specifying parameters for mounting of the Google Cloud Storage. The following arguments are required inside the `gs` block:

* `service_account` - (Optional) (String) email of registered [Google Service Account](https://docs.gcp.databricks.com/data/data-sources/google/gcs.html#step-1-set-up-google-cloud-service-account-using-google-cloud-console) for data access. If it's not specified, then the `cluster_id` should be provided, and the cluster should have a Google service account attached to it.
* `service_account` - (Optional) (String) email of registered [Google Service Account](https://docs.gcp.databricks.com/data/data-sources/google/gcs.html#step-1-set-up-google-cloud-service-account-using-google-cloud-console) for data access. If it's not specified, then the `cluster_id` should be provided, and the cluster should have a Google service account attached to it.
* `bucket_name` - (Required) (String) GCS bucket name to be mounted.

### Example mounting Google Cloud Storage
Expand All @@ -239,7 +243,7 @@ resource "databricks_mount" "this_gs" {

This block allows specifying parameters for mounting of the ADLS Gen1. The following arguments are required inside the `adl` block:

* `client_id` - (Required) (String) This is the client_id for the enterprise application for the service principal.
* `client_id` - (Required) (String) This is the client_id for the enterprise application for the service principal.
* `tenant_id` - (Optional) (String) This is your azure directory tenant id. It is required for creating the mount. (Could be omitted if Azure authentication is used, and we can extract `tenant_id` from it)
* `client_secret_key` - (Required) (String) This is the secret key in which your service principal/enterprise app client secret will be stored.
* `client_secret_scope` - (Required) (String) This is the secret scope in which your service principal/enterprise app client secret will be stored.
Expand Down Expand Up @@ -319,6 +323,7 @@ resource "databricks_mount" "marketing" {
## Migration from other mount resources

Migration from the specific mount resource is straightforward:

* rename `mount_name` to `name`
* wrap storage-specific settings (`container_name`, ...) into corresponding block (`adl`, `abfs`, `s3`, `wasbs`)
* for S3 mounts, rename `s3_bucket_name` to `bucket_name`
Expand Down
10 changes: 6 additions & 4 deletions docs/resources/service_principal.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ resource "databricks_service_principal" "sp" {
```

Creating service principal in AWS Databricks account:

```hcl
// initialize provider at account-level
provider "databricks" {
Expand All @@ -64,6 +65,7 @@ resource "databricks_service_principal" "sp" {
```

Creating group in Azure Databricks account:

```hcl
// initialize provider at Azure account-level
provider "databricks" {
Expand All @@ -85,7 +87,7 @@ resource "databricks_service_principal" "sp" {

The following arguments are available:

* `application_id` - This is the application id of the given service principal and will be their form of access and identity. On other clouds than Azure this value is auto-generated.
* `application_id` - This is the Azure Application ID of the given Azure service principal and will be their form of access and identity. On other clouds than Azure this value is auto-generated.
* `display_name` - (Required) This is an alias for the service principal and can be the full name of the service principal.
* `external_id` - (Optional) ID of the service principal in an external identity provider.
* `allow_cluster_create` - (Optional) Allow the service principal to have [cluster](cluster.md) create privileges. Defaults to false. More fine grained permissions could be assigned with [databricks_permissions](permissions.md#Cluster-usage) and `cluster_id` argument. Everyone without `allow_cluster_create` argument set, but with [permission to use](permissions.md#Cluster-Policy-usage) Cluster Policy would be able to create clusters, but within the boundaries of that specific policy.
Expand All @@ -105,10 +107,10 @@ In addition to all arguments above, the following attributes are exported:

## Import

The resource scim service principal can be imported using id:
The resource scim service principal can be imported using its id, for example `2345678901234567`. To get the service principal ID, call [Get service principals](https://docs.databricks.com/dev-tools/api/latest/scim/scim-sp.html#get-service-principals).

```bash
$ terraform import databricks_service_principal.me <service-principal-id>
terraform import databricks_service_principal.me <service-principal-id>
```

## Related Resources
Expand All @@ -120,4 +122,4 @@ The following resources are often used in the same context:
* [databricks_group](../data-sources/group.md) data to retrieve information about [databricks_group](group.md) members, entitlements and instance profiles.
* [databricks_group_member](group_member.md) to attach [users](user.md) and [groups](group.md) as group members.
* [databricks_permissions](permissions.md) to manage [access control](https://docs.databricks.com/security/access-control/index.html) in Databricks workspace.
* [databricks_sql_permissions](sql_permissions.md) to manage data object access control lists in Databricks workspaces for things like tables, views, databases, and [more](https://docs.databricks.
* [databricks_sql_permissions](sql_permissions.md) to manage data object access control lists in Databricks workspaces for things like tables, views, databases, and [more](<https://docs.databricks>.

0 comments on commit 67a8760

Please sign in to comment.