You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cluster should be created or a more verbose error could be displayed to explain that even though the API returns workspace is RUNNING, it is not yet fully operational
Actual Behavior
Workspace was running
2024-04-10T19:35:44.640-0600 [DEBUG] provider.terraform-provider-databricks_v1.39.0: GET /api/2.0/accounts/XXX/workspaces/XXX
< HTTP/2.0 200 OK
< {
...
< "workspace_status": "RUNNING",
but worker environment is not recognized so cluster creation fails
2024-04-10T19:36:17.232-0600 [ERROR] provider.terraform-provider-databricks_v1.39.0: Response contains error diagnostic: @module=sdk.proto diagnostic_detail="" diagnostic_severity=ERROR tf_proto_version=5.4 tf_req_id=XXX tf_rpc=ApplyResourceChange @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/vendor/github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_summary="cannot create cluster: XXX is not able to transition from TERMINATED to RUNNING: worker env WorkerEnvId(workerenv-XXXXX) not found
*Note that I replaced the actual values in above logs with XXXXX
Steps to Reproduce
1.39
Is it a regression?
Debug Output
Important Factoids
Would you like to implement a fix?
Below dependency flow addresses this behavior. It seems this little time is enough for the workspace information to fully propagate
…890)
## Changes
Due to the eventually consistent nature of worker environment creation
as a part of workspace setup, certain API requests can fail when made
right after workspace creation. We have some exception handling for
these in the SDKs, but a new case has appeared: "worker env
WorkerEnvId(workerenv-XXXXX) not found" (see
databricks/terraform-provider-databricks#3452).
This PR addresses this issue. Furthermore, it moves the transient error
messages into autogeneration so that there is a single source of truth
that applies to all SDKs.
One other small change: I removed running unit tests from the
autogeneration flow. It removes a small amount of convenience (users
need to run `make test` after regenerating the SDK), but it speeds up
the devloop when iterating on code generation, and it allows the release
flow to continue to make a PR before failing. This PR can be modified as
needed to fix any test failures or compilation failures, as per usual.
## Tests
Added a unit test to cover this.
- [ ] `make test` passing
- [ ] `make fmt` applied
- [ ] relevant integration tests applied
Configuration
# Copy-paste your Terraform configuration here
module "workspace" {
source = "./modules/workspace"
databricks_account_id = var.databricks_account_id
region = var.region
}
module "catalogs_binding" {
source = "./modules/catalog_bin"
workspace_id = module.workspace.workspace_id
depends_on = [ module.workspace ]
}
module "proxy" {
source = "./modules/proxy_cluster"
depends_on = [ module.workspace ]
}
and the cluster resource within proxy_cluster is
resource "databricks_cluster" "git_proxy" {
autotermination_minutes = 0
aws_attributes {
ebs_volume_count = 1
ebs_volume_size = 32
first_on_demand = 1
}
cluster_name = var.git_proxy_name
custom_tags = {
"ResourceClass" = "SingleNode"
}
provider = databricks.workspace
spark_version = data.databricks_spark_version.latest_lts.id
node_type_id = data.databricks_node_type.smallest.id
num_workers = 0
spark_conf = {
"spark.databricks.cluster.profile" : "singleNode",
"spark.master" : "local[*]",
}
spark_env_vars = {
"GIT_PROXY_ENABLE_SSL_VERIFICATION" : "False"
"GIT_PROXY_HTTP_PROXY" : "git_URL"
}
timeouts {
create = "30m"
update = "30m"
delete = "30m"
}
}
Expected Behavior
Cluster should be created or a more verbose error could be displayed to explain that even though the API returns workspace is RUNNING, it is not yet fully operational
Actual Behavior
Workspace was running
2024-04-10T19:35:44.640-0600 [DEBUG] provider.terraform-provider-databricks_v1.39.0: GET /api/2.0/accounts/XXX/workspaces/XXX
< HTTP/2.0 200 OK
< {
...
< "workspace_status": "RUNNING",
but worker environment is not recognized so cluster creation fails
2024-04-10T19:36:17.232-0600 [ERROR] provider.terraform-provider-databricks_v1.39.0: Response contains error diagnostic: @module=sdk.proto diagnostic_detail="" diagnostic_severity=ERROR tf_proto_version=5.4 tf_req_id=XXX tf_rpc=ApplyResourceChange @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/vendor/github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_summary="cannot create cluster: XXX is not able to transition from TERMINATED to RUNNING: worker env WorkerEnvId(workerenv-XXXXX) not found
*Note that I replaced the actual values in above logs with XXXXX
Steps to Reproduce
1.39
Is it a regression?
Debug Output
Important Factoids
Would you like to implement a fix?
Below dependency flow addresses this behavior. It seems this little time is enough for the workspace information to fully propagate
module "proxy" {
source = "./modules/proxy_cluster"
depends_on = [ module.workspace, module.catalogs_binding ] # fixes issue
}
The text was updated successfully, but these errors were encountered: