Generate dbt yml files using the CUE language
I personally don't love YAML and find it error prone. I use the CUE language to generate and validate all YAML files for just about everything I do at this point (Kubernetes, config, etc.) This repo demonstrates using CUE to generate dbt YAML files. There are a lot more checks to be added and you can easily clone and add your own validations as well, but this is a start.
The schema.cue file defines the types and values that are used to build the dbt yaml files. For now, I've only worked through building the Source yaml file. I will add more later. The schema.cue
file is for shared type and value definitions only and should not be edited.
User input goes into a src_
file (though the name doesn't matter), where the various objects are built.
sources: #sourceList & [_jaffleShop]
_jaffleShop: #Source & {
name: "jaffle_shop"
description: "Jaffle Shop is the testing source for dbt"
database: "dbt-tutorial"
schema: "jaffle_shop"
meta: [{"contains_pii": true, "owner": "@alice"}]
tags: ["ecom", "pii"]
quoting: #Quoting & {
database: false
schema: false
identifier: false
}
tables: [_jaffleOrderTable, _jaffleCustomerTable]
}
_jaffleOrderTable: #Table & {
name: "orders"
identifier: "Orders_"
loaded_at_field: "updated_at"
columns: [
#Column & {
name: "id"
tests: ["unique", "not_null"]
},
#Column & {
name: "status"
tests: [#AcceptedValues & {
accepted_values: {
values: ["placed", "shipped", "completed", "returned"]
}
}]
},
#Column & {
name: "price_in_usd"
tests: ["not_null"]
},
#Column & {
name: "customer_id"
tests: [
#Relationship & {
relationships: {
to: "ref('customers')"
field: "id"
}
},
]
},
]
}
_jaffleCustomerTable: #Table & {
name: "customers"
quoting: #Quoting & {
identifier: true
}
}
Creating CUE values reads well as it is a superset of JSON. Here we are creating a _jaffleShop value, which is a dbt Source (see
schema.cue
) for the #Source definition. We can add whatever key value pairs we like as meta, any list of strings for tags, etc. For more complex structures like atable
orquoting
, we use CUE definitions, there are examples here of creating them inline as well as separate entities.
The CUE cli is used to vet (validate the data against the schema), fmt (format - which nicely standardizes the input files) and finally export the data to the named file. CUE works with the concept of packages, so you can merge multiple files together. As noted before, I put my general types and values in the schema.cue
file and implement those types and values in a separate file (in this case src_jaffleshop.cue
).
cue vet schema.cue src_jaffleshop.cue
cue fmt schema.cue src_jaffleshop.cue
cue export schema.cue src_jaffleshop.cue --force --out yaml --outfile _jaffle_shop__sources.yml
For each command (vet, fmt, export) I am passing the list of files to be used when generating the output. --out is what type of output (yaml, json, etc.) The --outfile is what we want to name our output file.
I provided 2 different sample bash scripts (run_jaffleshop.sh
and run_stripe.sh
) that you can use to generate output or, if you prefer, you can input the cli commands yourself.
The below output is from CUE, you'll find it basically a match (minus spaces) of the Source example. I have added some additional examples beyond the example to demonstrate things like using accepted_values
tests, etc.
version: 2
sources:
- name: jaffle_shop
description: Jaffle Shop is the testing source for dbt
database: dbt-tutorial
schema: jaffle_shop
meta:
- contains_pii: true
owner: '@alice'
tags:
- ecom
- pii
quoting:
database: false
schema: false
identifier: false
tables:
- name: orders
identifier: Orders_
loaded_at_field: updated_at
columns:
- name: id
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values:
- placed
- shipped
- completed
- returned
- name: price_in_usd
tests:
- not_null
- name: customer_id
tests:
- relationships:
to: ref('customers')
field: id
- name: customers
quoting:
identifier: true