-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] DAB: Support for YAML Dictionary variables #1298
Comments
@stevenayers-bge is YAML anchors helpful in your case? |
Thanks for the reply 🙂 so it does, but it also feels messy to be honest. Maintaining your basic type variables in one way and your complex types in another. My understanding of YAML anchors is that you also can't access nested properties either. |
@andrewnester it's also worth mentioning that YAML anchors are only valid in the file they are defined in, so you can't define them where you define your variables, and then re-use them elsewhere. |
Indeed, you're correct on the points above. YAML anchors can be a solution in certain cases. Other than that, we've received multiple similar requests and planning to introduce some functionality which should improve the experience |
@andrewnester ok thanks that would be great. Is there a PR for this work already? If not I can look at kicking it off? |
There's no PR for this yet, we still need to figure out certain details about how we can surface this functionality in DABs configuration |
@andrewnester I saw that you said you've gotten similar requests but just wanted to add one more data point to it. I was actually just going to make a similar feature request regarding job cluster definition when I found this thread. My teams running into the exact problem @stevenayers-bge mentions with anchors. We have been using a single bundle file with anchors for defining the job cluster definitions which we then reference in each of the jobs. However, since the file is getting really long — with 5 jobs so far, each with many tasks, and more to come — we wanted to try breaking the jobs out into their own files, but this throws an error right away because the anchor pointers don't have the anchor definitions in these sub-files, and it doesn't make sense to copy the anchor definitions to each sub-file as then we have to make updates to all of those files anytime we change the cluster. So first idea that popped into my head was what if the job clusters were defined as a resource or similar top level item. I also like the dictionary idea suggested here. |
@kenmyers-8451 👍 we have 9 jobs at the moment, and at the moment we're defining the YAML anchors in each our environment yml files and then defining them at target level (target > resources > jobs > job_name) but it's already starting to look really messy. @andrewnester I'm keen to help contribute this feature if I get time. Do you know in which submodules there would need to be changes? Is it all in |
@stevenayers-bge @kenmyers-8451 Thanks everyone one for the interest in the feature and additional context! To be honest, the feature and its implementation is a bit a tricky as we need to take care of a few things: how do we do dynamic value normalisation, how do we validate that the variable follows correct structure (like cluster definition), how do we make it backward compatible with current variable definitions and etc. These are not very trivial questions but we as a team will pick up designing and implementing this feature in the nearest future. Thanks for your patience :) |
I would like to tack on to this issue the need for a dynamic way to switch between existing and job clusters, so you can re-use the same resource / job manifests both for development and in production. This is not actually possible today without some sort of templating, as the Yaml is different and this prevents you from simply overriding at the target level. As mentioned elsewhere, Yaml anchors do not help here as the resource manifests live separate from the targets, and cannot anchor between. |
@pietern @andrewnester does #1398 add this feature? |
@stevenayers-bge no, the PR you mentioned is to address schema issue when primitive non string variables are used. |
Hey everyone! We have a PR in progress adding the support for complex variables, feel free to follow it for any updates |
Nice work @andrewnester , can't wait to be able to use this. Does the PR also support when variables are not at top-level (so for instance inside a target)? An example that would be useful for us would be to have some default cluster settings per environment. bundle:
name: example
resources:
jobs:
SOME_JOB:
name: some_job
job_clusters:
- job_cluster_key: some_job_cluster
new_cluster: ${var.cluster}
tasks:
- task_key: test
job_cluster_key: key
targets:
dev:
mode: development
workspace:
host: https://some-dev-host
variables:
cluster:
description: "Common cluster definition for all jobs"
type: complex
default:
spark_version: 15.1.x-scala2.12
spark_env_vars:
ENVIRONMENT: DEV
prod:
mode: production
workspace:
host: https://some-prod-host
variables:
cluster:
description: "Common cluster definition for all jobs"
type: complex
default:
spark_version: 15.1.x-scala2.12
spark_env_vars:
ENVIRONMENT: PROD |
@wradstok thank you! yes, with this change it will be possible to use variables in target overrides as well |
## Changes Added support for complex variables Now it's possible to add and use complex variables as shown below ``` bundle: name: complex-variables resources: jobs: my_job: job_clusters: - job_cluster_key: key new_cluster: ${var.cluster} tasks: - task_key: test job_cluster_key: key variables: cluster: description: "A cluster definition" type: complex default: spark_version: "13.2.x-scala2.11" node_type_id: "Standard_DS3_v2" num_workers: 2 spark_conf: spark.speculation: true spark.databricks.delta.retentionDurationCheck.enabled: false ``` Fixes #1298 - [x] Support for complex variables - [x] Allow variable overrides (with shortcut) in targets - [x] Don't allow to provide complex variables via flag or env variable - [x] Fail validation if complex value is used but not `type: complex` provided - [x] Support using variables inside complex variables ## Tests Added unit tests --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
When writing Databricks Asset Bundles, it would be great if we could define things like job cluster configuration as one config block:
It would also be nice to be able to group variables using this method:
Please let me know if this is in progress, and either way I'll see if I can help contribute :)
The text was updated successfully, but these errors were encountered: