Skip to content

Commit

Permalink
misc doc updates
Browse files Browse the repository at this point in the history
  • Loading branch information
nkvuong committed Jul 26, 2023
1 parent b1afb08 commit d471811
Show file tree
Hide file tree
Showing 8 changed files with 64 additions and 36 deletions.
14 changes: 9 additions & 5 deletions docs/data-sources/aws_bucket_policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ subcategory: "Deployment"
---
# databricks_aws_bucket_policy Data Source

This datasource configures a simple access policy for AWS S3 buckets, so that Databricks can access data in it.
This datasource configures a simple access policy for AWS S3 buckets, so that Databricks can access data in it.

## Example Usage

Expand All @@ -30,15 +30,19 @@ Bucket policy with full access:
resource "aws_s3_bucket" "ds" {
bucket = "${var.prefix}-ds"
acl = "private"
versioning {
enabled = false
}
force_destroy = true
tags = merge(var.tags, {
Name = "${var.prefix}-ds"
})
}
resource "aws_s3_bucket_versioning" "ds_versioning" {
bucket = aws_s3_bucket.ds.id
versioning_configuration {
status = "Disabled"
}
}
data "aws_iam_policy_document" "assume_role_for_ec2" {
statement {
effect = "Allow"
Expand Down Expand Up @@ -74,7 +78,7 @@ resource "aws_s3_bucket_policy" "ds" {

* `bucket` - (Required) AWS S3 Bucket name for which to generate the policy document.
* `full_access_role` - (Optional) Data access role that can have full access for this bucket
* `databricks_e2_account_id` - (Optional) Your Databricks E2 account ID. Used to generate restrictive IAM policies that will increase the security of your root bucket
* `databricks_e2_account_id` - (Optional) Your Databricks E2 account ID. Used to generate restrictive IAM policies that will increase the security of your root bucket

## Attribute Reference

Expand Down
29 changes: 16 additions & 13 deletions docs/guides/aws-workspace.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,12 @@ locals {
```

Before [managing workspace](workspace-management.md), you have to create:
- [VPC](#vpc)
- [Root bucket](#root-bucket)
- [Cross-account role](#cross-account-iam-role)
- [Databricks E2 workspace](#databricks-e2-workspace)
- [Host and Token outputs](#provider-configuration)

- [VPC](#vpc)
- [Root bucket](#root-bucket)
- [Cross-account role](#cross-account-iam-role)
- [Databricks E2 workspace](#databricks-e2-workspace)
- [Host and Token outputs](#provider-configuration)

> Initialize provider with `alias = "mws"` and use `provider = databricks.mws` for all `databricks_mws_*` resources. We require all `databricks_mws_*` resources to be created within its own dedicated terraform module of your environment. Usually this module creates VPC and IAM roles as well.
Expand Down Expand Up @@ -203,9 +204,6 @@ Once [VPC](#vpc) is ready, create AWS S3 bucket for DBFS workspace storage, whic
resource "aws_s3_bucket" "root_storage_bucket" {
bucket = "${local.prefix}-rootbucket"
acl = "private"
versioning {
enabled = false
}
force_destroy = true
tags = merge(var.tags, {
Name = "${local.prefix}-rootbucket"
Expand Down Expand Up @@ -241,6 +239,13 @@ resource "aws_s3_bucket_policy" "root_bucket_policy" {
depends_on = [aws_s3_bucket_public_access_block.root_storage_bucket]
}
resource "aws_s3_bucket_versioning" "root_bucket_versioning" {
bucket = aws_s3_bucket.root_storage_bucket.id
versioning_configuration {
status = "Disabled"
}
}
resource "databricks_mws_storage_configurations" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
Expand Down Expand Up @@ -303,14 +308,14 @@ provider "databricks" {
token = module.e2.token_value
}
```
We assume that you have a terraform module in your project that creats a workspace (using [Databricks E2 Workspace](#databricks-e2-workspace) section) and you named it as `e2` while calling it in the **main.tf** file of your terraform project. And `workspace_url` and `token_value` are the output attributes of that module. This provider configuration will allow you to use the generated token during workspace creation to authenticate to the created workspace.

We assume that you have a terraform module in your project that creats a workspace (using [Databricks E2 Workspace](#databricks-e2-workspace) section) and you named it as `e2` while calling it in the **main.tf** file of your terraform project. And `workspace_url` and `token_value` are the output attributes of that module. This provider configuration will allow you to use the generated token during workspace creation to authenticate to the created workspace.

### Credentials validation checks errors

Due to a bug in the Terraform AWS provider (spotted in v3.28) the Databricks AWS cross-account policy creation and attachment to the IAM role takes longer than the AWS request confirmation to Terraform. As Terraform continues creating the Workspace, validation checks for the credentials are failing, as the policy doesn't get applied quick enough. Showing the error:

```
```sh
Error: MALFORMED_REQUEST: Failed credentials validation checks: Spot Cancellation, Create Placement Group, Delete Tags, Describe Availability Zones, Describe instances, Describe Instance Status, Describe Placement Group, Describe Route Tables, Describe Security Groups, Describe Spot Instances, Describe Spot Price History, Describe Subnets, Describe Volumes, Describe Vpcs, Request Spot Instances
(400 on /api/2.0/accounts/{UUID}/workspaces)
```
Expand All @@ -329,16 +334,14 @@ resource "time_sleep" "wait" {

If you notice below error:

```
```sh
Error: MALFORMED_REQUEST: Failed credentials validation checks: Spot Cancellation, Create Placement Group, Delete Tags, Describe Availability Zones, Describe instances, Describe Instance Status, Describe Placement Group, Describe Route Tables, Describe Security Groups, Describe Spot Instances, Describe Spot Price History, Describe Subnets, Describe Volumes, Describe Vpcs, Request Spot Instances
```

- Try creating workspace from UI:

![create_workspace_error](https://github.com/databricks/terraform-provider-databricks/raw/master/docs/images/create_workspace_error.png)


- Verify if the role and policy exists (assume role should allow external id)

![iam_role_trust_error](https://github.com/databricks/terraform-provider-databricks/raw/master/docs/images/iam_role_trust_error.png)

22 changes: 15 additions & 7 deletions docs/guides/unity-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,15 +132,12 @@ The first step is to create the required AWS objects:

- An S3 bucket, which is the default storage location for managed tables in Unity Catalog. Please use a dedicated bucket for each metastore.
- An IAM policy that provides Unity Catalog permissions to access and manage data in the bucket. Note that `<KMS_KEY>` is *optional*. If encryption is enabled, provide the name of the KMS key that encrypts the S3 bucket contents. *If encryption is disabled, remove the entire KMS section of the IAM policy.*
- An IAM role that is associated with the IAM policy and will be assumed by Unity Catalog.
- An IAM role that is associated with the IAM policy and will be assumed by Unity Catalog.

```hcl
resource "aws_s3_bucket" "metastore" {
bucket = "${local.prefix}-metastore"
acl = "private"
versioning {
enabled = false
}
force_destroy = true
tags = merge(local.tags, {
Name = "${local.prefix}-metastore"
Expand All @@ -156,6 +153,13 @@ resource "aws_s3_bucket_public_access_block" "metastore" {
depends_on = [aws_s3_bucket.metastore]
}
resource "aws_s3_bucket_versioning" "metastore_versioning" {
bucket = aws_s3_bucket.metastore.id
versioning_configuration {
status = "Disabled"
}
}
data "aws_iam_policy_document" "passrole_for_uc" {
statement {
effect = "Allow"
Expand Down Expand Up @@ -391,16 +395,20 @@ First, create the required objects in AWS.
resource "aws_s3_bucket" "external" {
bucket = "${local.prefix}-external"
acl = "private"
versioning {
enabled = false
}
// destroy all objects with bucket destroy
force_destroy = true
tags = merge(local.tags, {
Name = "${local.prefix}-external"
})
}
resource "aws_s3_bucket_versioning" "external_versioning" {
bucket = aws_s3_bucket.external.id
versioning_configuration {
status = "Disabled"
}
}
resource "aws_s3_bucket_public_access_block" "external" {
bucket = aws_s3_bucket.external.id
ignore_public_acls = true
Expand Down
10 changes: 7 additions & 3 deletions docs/resources/mws_log_delivery.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,6 @@ variable "databricks_account_id" {
resource "aws_s3_bucket" "logdelivery" {
bucket = "${var.prefix}-logdelivery"
acl = "private"
versioning {
enabled = false
}
force_destroy = true
tags = merge(var.tags, {
Name = "${var.prefix}-logdelivery"
Expand All @@ -42,6 +39,13 @@ data "databricks_aws_assume_role_policy" "logdelivery" {
for_log_delivery = true
}
resource "aws_s3_bucket_versioning" "logdelivery_versioning" {
bucket = aws_s3_bucket.logdelivery.id
versioning_configuration {
status = "Disabled"
}
}
resource "aws_iam_role" "logdelivery" {
name = "${var.prefix}-logdelivery"
description = "(${var.prefix}) UsageDelivery role"
Expand Down
4 changes: 1 addition & 3 deletions docs/resources/mws_networks.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,6 @@ resource "databricks_mws_networks" "this" {

### Creating a Databricks on GCP workspace

-> **Public Preview** This feature is in [Public Preview](https://docs.databricks.com/release-notes/release-types.html) on GCP.

```hcl
variable "databricks_account_id" {
description = "Account Id that could be found in the bottom left corner of https://accounts.cloud.databricks.com/"
Expand Down Expand Up @@ -231,5 +229,5 @@ The following resources are used in the same context:
* [Provisioning Databricks on GCP](../guides/gcp-workspace.md) guide.
* [Provisioning Databricks workspaces on GCP with Private Service Connect](../guides/gcp-private-service-connect-workspace.md) guide.
* [databricks_mws_vpc_endpoint](mws_vpc_endpoint.md) to register [aws_vpc_endpoint](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/vpc_endpoint) resources with Databricks such that they can be used as part of a [databricks_mws_networks](mws_networks.md) configuration.
* [databricks_mws_private_access_settings](mws_private_access_settings.md) to create a Private Access Setting that can be used as part of a [databricks_mws_workspaces](mws_workspaces.md) resource to create a [Databricks Workspace that leverages AWS PrivateLink](https://docs.databricks.com/administration-guide/cloud-configurations/aws/privatelink.html) or [GCP Private Service Connect] (https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/private-service-connect.html).
* [databricks_mws_private_access_settings](mws_private_access_settings.md) to create a Private Access Setting that can be used as part of a [databricks_mws_workspaces](mws_workspaces.md) resource to create a [Databricks Workspace that leverages AWS PrivateLink](https://docs.databricks.com/administration-guide/cloud-configurations/aws/privatelink.html) or [GCP Private Service Connect](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/private-service-connect.html).
* [databricks_mws_workspaces](mws_workspaces.md) to set up [workspaces in E2 architecture on AWS](https://docs.databricks.com/getting-started/overview.html#e2-architecture-1).
8 changes: 6 additions & 2 deletions docs/resources/mws_storage_configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,12 @@ variable "databricks_account_id" {
resource "aws_s3_bucket" "root_storage_bucket" {
bucket = "${var.prefix}-rootbucket"
acl = "private"
versioning {
enabled = false
}
resource "aws_s3_bucket_versioning" "root_versioning" {
bucket = aws_s3_bucket.root_storage_bucket.id
versioning_configuration {
status = "Disabled"
}
}
Expand Down
10 changes: 7 additions & 3 deletions docs/resources/mws_workspaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,13 +137,17 @@ resource "databricks_mws_credentials" "this" {
resource "aws_s3_bucket" "root_storage_bucket" {
bucket = "${local.prefix}-rootbucket"
acl = "private"
versioning {
enabled = false
}
force_destroy = true
tags = var.tags
}
resource "aws_s3_bucket_versioning" "root_versioning" {
bucket = aws_s3_bucket.root_storage_bucket.id
versioning_configuration {
status = "Disabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "root_storage_bucket" {
bucket = aws_s3_bucket.root_storage_bucket.bucket
Expand Down
3 changes: 3 additions & 0 deletions docs/resources/volume.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ subcategory: "Unity Catalog"
---
# databricks_volume (Resource)

-> **Public Preview** This feature is in [Public Preview](https://docs.databricks.com/release-notes/release-types.html).

Volumes are Unity Catalog objects representing a logical volume of storage in a cloud object storage location. Volumes provide capabilities for accessing, storing, governing, and organizing files. While tables provide governance over tabular datasets, volumes add governance over non-tabular datasets. You can use volumes to store and access files in any format, including structured, semi-structured, and unstructured data.

A volume resides in the third layer of Unity Catalog’s three-level namespace. Volumes are siblings to tables, views, and other objects organized under a schema in Unity Catalog.
Expand All @@ -14,6 +16,7 @@ A **managed volume** is a Unity Catalog-governed storage volume created within t
An **external volume** is a Unity Catalog-governed storage volume registered against a directory within an external location.

A volume can be referenced using its identifier: ```<catalogName>.<schemaName>.<volumeName>```, where:

* ```<catalogName>```: The name of the catalog containing the Volume.
* ```<schemaName>```: The name of the schema containing the Volume.
* ```<volumeName>```: The name of the Volume. It identifies the volume object.
Expand Down

0 comments on commit d471811

Please sign in to comment.