Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLAT-76309]Add doc guide to create a gcp private service connect workspace. #2091

Merged
merged 2 commits into from
Mar 13, 2023

Conversation

jessiedu-db
Copy link
Contributor

@jessiedu-db jessiedu-db commented Mar 8, 2023

PLAT-76309Add doc guide to create a gcp private service connect workspace.

@codecov-commenter
Copy link

Codecov Report

Merging #2091 (90fe7e6) into master (bd05be2) will decrease coverage by 0.19%.
The diff coverage is 81.39%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2091      +/-   ##
==========================================
- Coverage   89.78%   89.60%   -0.19%     
==========================================
  Files         136      136              
  Lines       11077    11055      -22     
==========================================
- Hits         9946     9906      -40     
- Misses        732      743      +11     
- Partials      399      406       +7     
Impacted Files Coverage Δ
catalog/resource_share.go 95.00% <ø> (-0.13%) ⬇️
repos/resource_git_credential.go 88.57% <78.94%> (-11.43%) ⬇️
catalog/data_schemas.go 100.00% <100.00%> (ø)
catalog/data_shares.go 100.00% <100.00%> (ø)
catalog/data_tables.go 100.00% <100.00%> (ø)
clusters/data_node_type.go 76.19% <0.00%> (-15.77%) ⬇️
common/resource.go 75.97% <0.00%> (-1.73%) ⬇️
sql/api/query.go 75.40% <0.00%> (ø)
storage/mounts.go 96.47% <0.00%> (ø)
catalog/data_views.go 100.00% <0.00%> (ø)
... and 7 more

Comment on lines 9 to 147
service_account_id = google_service_account.sa2.name
policy_data = data.google_iam_policy.this.policy_data
}

resource "google_project_iam_custom_role" "workspace_creator" {
role_id = "${var.prefix}_workspace_creator"
title = "Databricks Workspace Creator"
permissions = [
"iam.serviceAccounts.getIamPolicy",
"iam.serviceAccounts.setIamPolicy",
"iam.roles.create",
"iam.roles.delete",
"iam.roles.get",
"iam.roles.update",
"resourcemanager.projects.get",
"resourcemanager.projects.getIamPolicy",
"resourcemanager.projects.setIamPolicy",
"serviceusage.services.get",
"serviceusage.services.list",
"serviceusage.services.enable",
"compute.networks.get",
"compute.projects.get",
"compute.subnetworks.get",
]
}

data "google_client_config" "current" {}

output "custom_role_url" {
value = "https://console.cloud.google.com/iam-admin/roles/details/projects%3C${data.google_client_config.current.project}%3Croles%3C${google_project_iam_custom_role.workspace_creator.role_id}"
}

resource "google_project_iam_member" "sa2_can_create_workspaces" {
project = var.project
role = google_project_iam_custom_role.workspace_creator.id
member = "serviceAccount:${google_service_account.sa2.email}"
}
```

After you’ve added Service Account to Databricks Accounts Console, please copy its name into `databricks_google_service_account` variable. If you prefer environment variables - `DATABRICKS_GOOGLE_SERVICE_ACCOUNT` is the one you’ll use instead. Please also copy Account ID into `databricks_account_id` variable.

## Authenticate with Databricks account API

Databricks account-level APIs can only be called by account owners and account admins, and can only be authenticated using Google-issued OIDC tokens. The simplest way to do this would be via [Google Cloud CLI](https://cloud.google.com/sdk/gcloud). The `gcloud` command is available after installing the SDK. Then run the following commands

* `gcloud auth application-default login` to authorise your user with Google Cloud Platform. (If you want to use your [service account's credentials instead](https://cloud.google.com/docs/authentication/provide-credentials-adc#local-key), set the environment variable `GOOGLE_APPLICATION_CREDENTIALS` to the path of the JSON file that contains your service account key)
* `terraform init` to load Google and Databricks Terraform providers.
* `terraform apply` to apply the configuration changes. Terraform will use your credential to impersonate the service account specified in `databricks_google_service_account` to call the Databricks account-level API.

Alternatively, if you cannot use impersonation and [Application Default Credentials](https://cloud.google.com/docs/authentication/production) as configured by `gcloud`, consider using the service account key directly by passing it to `google_credentials` parameter (or `GOOGLE_CREDENTIALS` environment variable) to avoid using `gcloud`, impersonation, and ADC altogether. The content of this parameter must be either the path to `.json` file or the full JSON content of the Google service account key.

## Provider initialization

```hcl
variable "databricks_account_id" {}
variable "databricks_google_service_account" {}
variable "google_project" {}
variable "google_region" {}
variable "google_zone" {}


terraform {
required_providers {
databricks = {
source = "databricks/databricks"
}
google = {
source = "hashicorp/google"
version = "4.47.0"
}
}
}

provider "google" {
project = var.google_project
region = var.google_region
zone = var.google_zone
}

// initialize provider in "accounts" mode to provision new workspace

provider "databricks" {
alias = "accounts"
host = "https://accounts.gcp.databricks.com"
google_service_account = var.databricks_google_service_account
account_id = var.databricks_account_id
}

data "google_client_openid_userinfo" "me" {
}

data "google_client_config" "current" {
}

resource "random_string" "suffix" {
special = false
upper = false
length = 6
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

The very first step is VPC creation with necessary resources. Please consult [main documentation page](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/customer-managed-vpc.html) for **the most complete and up-to-date details on networking**. A GCP VPC is registered as [databricks_mws_networks](../resources/mws_networks.md) resource.

To enable [back-end Private Service Connect (data plane to control plane)](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/private-service-connect.html#two-private-service-connect-options), configure network with the two back-end VPC endpoints:
- Back-end VPC endpoint for SSC relay
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCC is not mentioned on the doc. Can we add a link for details?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 290 to 311
### Data resources and Authentication is not configured errors

*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee a proper lazy authentication with data resources, you should add `depends_on = [databricks_mws_workspaces.this]` to the body. This issue doesn't occur if workspace is created *in one module* and resources [within the workspace](workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier, if your usage involves data resources.

```hcl
data "databricks_current_user" "me" {
depends_on = [databricks_mws_workspaces.this]
}
```

## Provider configuration

In [the next step](workspace-management.md), please use the following configuration for the provider:

```hcl
provider "databricks" {
host = module.dbx_gcp.workspace_url
token = module.dbx_gcp.token_value
}
```

We assume that you have a terraform module in your project that creats a workspace (using [Databricks Workspace](#creating-a-databricks-workspace) section) and you named it as `dbx_gcp` while calling it in the **main.tf** file of your terraform project. And `workspace_url` and `token_value` are the output attributes of that module. This provider configuration will allow you to use the generated token during workspace creation to authenticate to the created workspace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to PSC. Do we need them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied it from gcp_workspace.md. I don't know what they are. Removed.

@nfx nfx merged commit 9d9892c into databricks:master Mar 13, 2023
@nfx nfx mentioned this pull request Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants