-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] Remove single-node validation from interactive clusters #4222
[Fix] Remove single-node validation from interactive clusters #4222
Conversation
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
Test Details: go/deco-tests/11818584690 |
@tanmay-db what is your opinion giving that you're porting clusters to the plugin framework? |
@shreyas-goenka The PR you reference talks about job clusters. Am I missing the ref to interactive clusters? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok from the code perspective. Let's wait for Tanmay's opinion
@pietern, you are right. The issue only discusses job clusters. There's no reference for interactive clusters, but now that we support interactive clusters via DABs, we should solve this issue here as well and keep them consistent. Updated the PR description to say as much. |
Hi @alexott, it looks good, I will update the plugin framework implementation to not have this validation. |
@tanmay-db We'll be adding this validation to DABs as a warning instead of a warning error. Maybe that's something we should do in the plugin framework as well if supported. |
### New Features and Improvements * Add `databricks_mws_network_connectivity_config` and `databricks_mws_network_connectivity_configs` data source ([#3665](#3665)). * Add support partitions in policy data sources ([#4181](#4181)). * Added `databricks_registered_model_versions` data source ([#4100](#4100)). * Update databricks_permissions resource to support vector-search-endpoints ([#4209](#4209)). * add `databricks_serving_endpoints` data source ([#4226](#4226)). ### Bug Fixes * Add validation for `run_as_mode` in `databricks_query` ([#4233](#4233)). * Correct handling of updates for empty comments and `force_destroy` in UC catalog, schema, registered models and volumes ([#4244](#4244)). * Fix deletion of dashboard if it was trashed out of band ([#4235](#4235)). * Fix waiting for `databricks_vector_search_index` readiness ([#4243](#4243)). * Remove single-node validation from interactive clusters ([#4222](#4222)). * Remove single-node validation from jobs clusters ([#4216](#4216)). * Use cluster list API to determine pinned cluster status ([#4203](#4203)). * fix issue cased by setting pause_status in update monitor ([#4242](#4242)). ### Documentation * Clarify workspace provider config ([#4208](#4208)). * Update "Databricks Workspace Creator" permissions on gcp-workspace.md ([#4201](#4201)). * Update `grants.md` references ([#4246](#4246)). * Update description of `group_id` in `databricks_mws_ncc_private_endpoint_rule` ([#4238](#4238)). * remove subnet sharing limitation in AWS ([#4239](#4239)). ### Internal Changes * Bump Go SDK to latest and generate TF structs ([#4249](#4249)). * Mark TestUcAccModelServingProvisionedThroughput as flaky. to be rever… ([#4232](#4232)). * Rename resources directory to products in pluginframework ([#4139](#4139)). * Revert "mark TestUcAccModelServingProvisionedThroughput as flaky. to … ([#4240](#4240)). * Set user agent in some resources implemented in plugin framework ([#4187](#4187)). * make `ApplyAndExpectData` work with nested set ([#4237](#4237)). ### Dependency Updates * Bump dependencies for Plugin Framework and SDK v2 ([#4215](#4215)). * Bump github.com/hashicorp/hcl/v2 from 2.22.0 to 2.23.0 ([#4236](#4236)). * Bump github.com/hashicorp/terraform-plugin-testing from 1.10.0 to 1.11.0 ([#4247](#4247)). ### Exporter * Add `List` operation for `users` service ([#4204](#4204)). * Fix interactive selection of services ([#4245](#4245)).
* Add `databricks_mws_network_connectivity_config` and `databricks_mws_network_connectivity_configs` data source ([#3665](#3665)). * Add support partitions in policy data sources ([#4181](#4181)). * Added `databricks_registered_model_versions` data source ([#4100](#4100)). * Update databricks_permissions resource to support vector-search-endpoints ([#4209](#4209)). * add `databricks_serving_endpoints` data source ([#4226](#4226)). * Add validation for `run_as_mode` in `databricks_query` ([#4233](#4233)). * Correct handling of updates for empty comments and `force_destroy` in UC catalog, schema, registered models and volumes ([#4244](#4244)). * Fix deletion of dashboard if it was trashed out of band ([#4235](#4235)). * Fix waiting for `databricks_vector_search_index` readiness ([#4243](#4243)). * Remove single-node validation from interactive clusters ([#4222](#4222)). * Remove single-node validation from jobs clusters ([#4216](#4216)). * Use cluster list API to determine pinned cluster status ([#4203](#4203)). * fix issue cased by setting pause_status in update monitor ([#4242](#4242)). * Clarify workspace provider config ([#4208](#4208)). * Update "Databricks Workspace Creator" permissions on gcp-workspace.md ([#4201](#4201)). * Update `grants.md` references ([#4246](#4246)). * Update description of `group_id` in `databricks_mws_ncc_private_endpoint_rule` ([#4238](#4238)). * remove subnet sharing limitation in AWS ([#4239](#4239)). * Bump Go SDK to latest and generate TF structs ([#4249](#4249)). * Mark TestUcAccModelServingProvisionedThroughput as flaky. to be rever… ([#4232](#4232)). * Rename resources directory to products in pluginframework ([#4139](#4139)). * Revert "mark TestUcAccModelServingProvisionedThroughput as flaky. to … ([#4240](#4240)). * Set user agent in some resources implemented in plugin framework ([#4187](#4187)). * make `ApplyAndExpectData` work with nested set ([#4237](#4237)). * Bump dependencies for Plugin Framework and SDK v2 ([#4215](#4215)). * Bump github.com/hashicorp/hcl/v2 from 2.22.0 to 2.23.0 ([#4236](#4236)). * Bump github.com/hashicorp/terraform-plugin-testing from 1.10.0 to 1.11.0 ([#4247](#4247)). * Add `List` operation for `users` service ([#4204](#4204)). * Fix interactive selection of services ([#4245](#4245)).
### New Features and Improvements * Add `databricks_mws_network_connectivity_config` and `databricks_mws_network_connectivity_configs` data source ([#3665](#3665)). * Add support partitions in policy data sources ([#4181](#4181)). * Added `databricks_registered_model_versions` data source ([#4100](#4100)). * Update databricks_permissions resource to support vector-search-endpoints ([#4209](#4209)). * add `databricks_serving_endpoints` data source ([#4226](#4226)). ### Bug Fixes * Add validation for `run_as_mode` in `databricks_query` ([#4233](#4233)). * Correct handling of updates for empty comments and `force_destroy` in UC catalog, schema, registered models and volumes ([#4244](#4244)). * Fix deletion of dashboard if it was trashed out of band ([#4235](#4235)). * Fix waiting for `databricks_vector_search_index` readiness ([#4243](#4243)). * Remove single-node validation from interactive clusters ([#4222](#4222)). * Remove single-node validation from jobs clusters ([#4216](#4216)). * Use cluster list API to determine pinned cluster status ([#4203](#4203)). * fix issue cased by setting pause_status in update monitor ([#4242](#4242)). ### Documentation * Clarify workspace provider config ([#4208](#4208)). * Update "Databricks Workspace Creator" permissions on gcp-workspace.md ([#4201](#4201)). * Update `grants.md` references ([#4246](#4246)). * Update description of `group_id` in `databricks_mws_ncc_private_endpoint_rule` ([#4238](#4238)). * remove subnet sharing limitation in AWS ([#4239](#4239)). ### Internal Changes * Bump Go SDK to latest and generate TF structs ([#4249](#4249)). * Mark TestUcAccModelServingProvisionedThroughput as flaky. to be rever… ([#4232](#4232)). * Rename resources directory to products in pluginframework ([#4139](#4139)). * Revert "mark TestUcAccModelServingProvisionedThroughput as flaky. to … ([#4240](#4240)). * Set user agent in some resources implemented in plugin framework ([#4187](#4187)). * make `ApplyAndExpectData` work with nested set ([#4237](#4237)). ### Dependency Updates * Bump dependencies for Plugin Framework and SDK v2 ([#4215](#4215)). * Bump github.com/hashicorp/hcl/v2 from 2.22.0 to 2.23.0 ([#4236](#4236)). * Bump github.com/hashicorp/terraform-plugin-testing from 1.10.0 to 1.11.0 ([#4247](#4247)). ### Exporter * Add `List` operation for `users` service ([#4204](#4204)). * Fix interactive selection of services ([#4245](#4245)).
## Changes This PR adds a warning validating that the configuration for a single node cluster is valid for interactive, job, job-task, and pipeline clusters. Note: We skip the validation if a cluster policy is configured because the policy is likely to configure `spark_conf` / `custom_tags` itself. Note: Terrform originally only had validation for interactive, job, and job-task clusters. This PR adding the validation for pipeline clusters as well is new. This PR follows the same logic as we used to have in Terraform. The validation was removed from Terraform because we had no way to demote the error to a warning: databricks/terraform-provider-databricks#4222 ### Background Single-node clusters require `spark_conf` and `custom_tags` to be correctly set in the cluster definition for them to function optimally. The cluster will be created even if incorrectly configured, but its performance will not be great. For example, if both `spark_conf` and `custom_tags` are not set and `num_workers` is 0, then only the driver process will be launched on the cluster compute instance thus leading to sub-optimal utilization of available compute resources and no parallelization across worker processes when processing a spark query. ### Issue This PR addresses some issues reported in #1546 ## Tests Unit tests and manually. Example output of the warning: ``` ➜ bundle-playground git:(master) ✗ cli bundle validate Warning: Single node cluster is not correctly configured at resources.pipelines.bar.clusters[0] in databricks.yml:29:11 num_workers should be 0 only for single-node clusters. To create a valid single node cluster please ensure that the following properties are correctly set in the cluster specification: spark_conf: spark.databricks.cluster.profile: singleNode spark.master: local[*] custom_tags: ResourceClass: SingleNode Name: foobar Target: default Workspace: User: shreyas.goenka@databricks.com Path: /Workspace/Users/shreyas.goenka@databricks.com/.bundle/foobar/default Found 1 warning ```
Changes
There were reports on databricks/cli#1546 about customers being unable to use cluster policies for single-node job clusters because Terraform was performing this validation. #4216 removed this validation for job clusters. This PR removes this validation for interactive clusters as well to keep them consistent and unblock the use of policy from interactive clusters.
Also, the clusters team is improving the API interface to simplify creating single-node clusters, significantly reducing the chances of users misconfiguring them.
Tests
Unit test and manually. Clusters are now successfully created and updated with
num_workers=0