From 8fa8192c77d2fff1246080c5dc3112207ea6a953 Mon Sep 17 00:00:00 2001 From: Sung Won Chung Date: Sun, 19 Feb 2023 15:09:20 -0800 Subject: [PATCH 01/16] dbt-constraints-docs (#2601) * placeholder outline * add version blocks * add to sidebar * remove version blocks * add 2 sections * postgres section example * add more docs * correct DDL * correct config name * fix data type * add redshift docs * Update constraints docs for Snowflake * add model config links * update warehouse to spark * update not null syntax * add more config examples * fix ordering * update docs based on new parsing * add a note * add example error messages * Update config name on Redshift * Update description for Spark * add explainers * add check * remove fluff --------- Co-authored-by: Dave Connors Co-authored-by: Benoit Perigaud <8754100+b-per@users.noreply.github.com> --- website/docs/reference/model-configs.md | 7 +- .../resource-configs/constraints_enabled.md | 548 ++++++++++++++++++ website/sidebars.js | 1 + 3 files changed, 554 insertions(+), 2 deletions(-) create mode 100644 website/docs/reference/resource-configs/constraints_enabled.md diff --git a/website/docs/reference/model-configs.md b/website/docs/reference/model-configs.md index 668856b6bdb..3f4702e4c01 100644 --- a/website/docs/reference/model-configs.md +++ b/website/docs/reference/model-configs.md @@ -105,6 +105,7 @@ models: [+](plus-prefix)[full_refresh](full_refresh): [+](plus-prefix)[meta](meta): {} [+](plus-prefix)[grants](grants): {} + [+](plus-prefix)[constraints_enabled](constraints_enabled): true | false ``` @@ -134,6 +135,7 @@ models: [full_refresh](full_refresh): [meta](meta): {} [grants](grants): {} + [constraints_enabled](constraints_enabled): true | false ``` @@ -157,8 +159,9 @@ models: [schema](resource-configs/schema)="", [alias](resource-configs/alias)="", [persist_docs](persist_docs)={}, - [meta](meta)={} - [grants](grants)={} + [meta](meta)={}, + [grants](grants)={}, + [constraints_enabled](constraints_enabled)=true | false ) }} ``` diff --git a/website/docs/reference/resource-configs/constraints_enabled.md b/website/docs/reference/resource-configs/constraints_enabled.md new file mode 100644 index 00000000000..c7741eddbc7 --- /dev/null +++ b/website/docs/reference/resource-configs/constraints_enabled.md @@ -0,0 +1,548 @@ +--- +resource_types: [models] +datatype: "{}" +default_value: {constraints_enabled: false} +id: "constraints_enabled" +--- + + +# Definition + +You can manage data type constraints on your models using the `constraints_enabled` configuration. This configuration is available on all models, and is disabled by default. When enabled, dbt will automatically add constraints to your models based on the data types of the columns in your model's schema. This is a great way to ensure your data is always in the correct format. For example, if you have a column in your model that is defined as a `date` data type, dbt will automatically add a data type constraint to that column to ensure the data in that column is always a valid date. If you want to add a `not null` constraint to a column in a preventative manner rather than as a test, you can add the `not null` value to the column definition in your model's schema: `constraints: ['not null']`. + +## When to use constraints vs. tests + +Constraints serve as a **preventative** measure against bad data quality **before** the dbt model is (re)built. It is only limited by the respective database's funcionality and the data types that are supported. Examples of a constraint: `not null`, `unique`, `primary key`, `foreign key`, `check` + +Tests serve as a **detective** measure against bad data quality **after** the dbt model is (re)built. + +Constraints are great when you define `constraints: ['not null']` for a column in your model's schema because it'll prevent `null` values being inserted into that column at dbt model creation time. AND it'll prevent other unintended values from being inserted into that column without dbt's intervention as it relies on the database to enforce the constraint. This can **replace** the `not_null` test. However, performance issues may arise depending on your database. + +Tests should be used in addition to and instead of constraints when you want to test things like `accepted_values` and `relationships`. These are usually not enforced with built-in database functionality and are not possible with constraints. Also, custom tests will allow more flexibility and address nuanced data quality issues that may not be possible with constraints. + +## Configuring Constraints + +You can configure `constraints_enabled` in `schema.yml` files to apply constraints one-by-one for specific dbt models in `yml` config blocks. You'll receive dynamic error messages if you do not configure constraints based on the criteria below. + +- Constraints must be defined in a `yml` schema configuration file like `schema.yml`. + +- Only the `SQL` **table** materialization is supported for constraints. + +```txt +Parsing Error + Original File Path: (models/constraints_examples/constraints_example.sql) + Constraints must be defined in a `yml` schema configuration file like `schema.yml`. + Only the SQL table materialization is supported for constraints. + `data_type` values must be defined for all columns and NOT be null or blank. + Materialization Error: {'materialization': 'snapshot'} +``` + +- `data_type` values must be defined for all columns and NOT be null or blank. + +```txt +Parsing Error + Original File Path: (models/constraints_examples/constraints_example.sql) + Constraints must be defined in a `yml` schema configuration file like `schema.yml`. + Only the SQL table materialization is supported for constraints. + `data_type` values must be defined for all columns and NOT be null or blank. + Columns with `data_type` Blank/Null Errors: {'id'} +``` + +- `constraints_enabled=true` can only be configured within `schema.yml` files NOT within a model file(ex: .sql, .py) or `dbt_project.yml`. *(Note: Current parsing mechanics require all constraints configs be written in schema files to be implemented exactly as configured. This may change in the future.)* + +```txt +Parsing Error + Original File Path: (models/constraints_examples/constraints_example.sql) + Constraints must be defined in a `yml` schema configuration file like `schema.yml`. + Only the SQL table materialization is supported for constraints. + `data_type` values must be defined for all columns and NOT be null or blank. + `constraints_enabled=true` can only be configured within `schema.yml` files + NOT within a model file(ex: .sql, .py) or `dbt_project.yml`. +``` + +- Please ensure the name, order, and number of columns in your `yml` file match the columns in your SQL file. + +```txt +# example error message +Compilation Error in model constraints_example (models/constraints_examples/constraints_example.sql) + Please ensure the name, order, and number of columns in your `yml` file match the columns in your SQL file. + Schema File Columns: ['ID', 'COLOR', 'DATE_DAY'] + SQL File Columns: ['ERROR', 'COLOR', 'DATE_DAY'] +``` + +> Note: Constraints and data type inheritance across downstream tables depends on database-specific functionality. We recommend defining constraints for all tables in scope where desired. +> Constraints can be defined as a list or in bullet form. Both are valid. + + + + + + + +```yml +models: + - name: constraints_example + docs: + node_color: black + config: + constraints_enabled: true + columns: + - name: id + data_type: integer + description: hello + constraints: ['not null','primary key'] + constraints_check: (id > 0) + tests: + - unique + - name: color + constraints: + - not null + - primary key + data_type: string + - name: date_day + data_type: date +``` + + + +The `constraints_enabled` config can also be defined: + +- under the `models` config block in `dbt_project.yml` only for `false` configs + + + +```yml +models: + tpch: + staging: + +materialized: view + +docs: + node_color: "#cd7f32" + + marts: + core: + +constraints_enabled: false # enforce data type constraints across all models in the "/marts/core" subfolder + materialized: table +``` + + + +- in a `config()` Jinja macro within a model's SQL file only for `false` configs + + + +```sql +{{ + config( + materialized = "table", + constraints_enabled = false + ) +}} + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +``` + + + +See [configs and properties](configs-and-properties) for details. + + + + +### Database-specific Examples and Notes + + + +
+ +On BigQuery, "privileges" are called "roles," and they take the form `roles/service.roleName`. For instance, instead of granting `select` on a model, you would grant `roles/bigquery.dataViewer`. + +Grantees can be users, groups, service accounts, domains—and each needs to be clearly demarcated as such with a prefix. For instance, to grant access on a model to `someone@yourcompany.com`, you need to specify them as `user:someone@yourcompany.com`. + +We encourage you to read Google's documentation for more context: +- [Understanding GCP roles](https://cloud.google.com/iam/docs/understanding-roles) +- [How to format grantees](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-control-language#user_list) + + + +### BigQuery examples + +Granting permission using SQL and BigQuery: + +```sql +{{ config(grants = {'roles/bigquery.dataViewer': ['user:someone@yourcompany.com']}) }} +``` + +Granting permission in a model schema using BigQuery: + + + +```yml +models: + - name: specific_model + config: + grants: + roles/bigquery.dataViewer: ['user:someone@yourcompany.com'] +``` + + + +
+ +
+ +Spark allows you to define: + +- a `not null` constraint +- and/or additional constraint checks on your columns + +As Spark does not support transactions nor allows using `create or replace table` with a schema, the table is first created without a schema and `alter` statements are then executed to add the different constraints. + +This means that: + +- the names and order of columns is checked but not their type +- if the `constraints` and/or `constraint_check` fails, the table with the failing data will still exist in the Warehouse + +See [this page](https://docs.databricks.com/tables/constraints.html) with more details about the support of constraints on Spark. + + + +```sql +{{ + config( + materialized = "table" + ) +}} + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +``` + + + + + +```yml +models: + - name: constraints_example + docs: + node_color: black + config: + constraints_enabled: true + columns: + - name: id + data_type: integer + description: hello + constraints: ['not null'] + constraints_check: "(id > 0)" + tests: + - unique + - name: color + data_type: text + - name: date_day + data_type: date +``` + + + +Expected DDL to enforce constraints: + + +```sql + create or replace table schema_name.my_model + using delta + as + select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +``` + + + +Followed by the statements + +```sql +alter table schema_name.my_model change column id set not null; +alter table schema_name.my_model add constraint 472394792387497234 check (id > 0); +``` + +
+ +
+ +Redshift currently only enforces `not null` constraints; all other constraints are metadata only. Additionally, Redshift does not allow column checks at the time of table creation. See more in the Redshift documentation [here](https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html). + + + +```sql +{{ + config( + materialized = "table" + ) +}} + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +``` + + + + + +```yml +models: + - name: constraints_example + docs: + node_color: black + config: + constraints_enabled: true + columns: + - name: id + data_type: integer + description: hello + constraints: ['not null','primary key'] + constraints_check: (id > 0) + tests: + - unique + - name: color + data_type: varchar + - name: date_day + data_type: date +``` + + + +Expected DDL to enforce constraints: + + +```sql + +create table + "database_name"."schema_name"."my_model__dbt_tmp" + + ( + id integer not null, + color varchar, + date_day date, + primary key(id) + ) + + + + ; + insert into "database_name"."schema_name"."my_model__dbt_tmp" + ( + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day + ) + ; + +``` + + + + +
+ +
+ +- Snowflake constraints documentation: [here](https://docs.snowflake.com/en/sql-reference/constraints-overview.html) +- Snowflake data types: [here](https://docs.snowflake.com/en/sql-reference/intro-summary-data-types.html) + +Snowflake suppports four types of constraints: `unique`, `not null`, `primary key` and `foreign key`. + +It is important to note that only the `not null` (and the `not null` property of `primary key`) are actually checked today. +There rest of the constraints are purely metadata, not verified when inserting data. + +Currently, Snowflake doesn't support the `check` syntax and dbt will skip the `check` config and raise a warning message if it is set on some models in the dbt project. + + + +```sql +{{ + config( + materialized = "table" + ) +}} + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +``` + + + + + +```yml +models: + - name: constraints_example + docs: + node_color: black + config: + constraints_enabled: true + columns: + - name: id + data_type: integer + description: hello + constraints: ['not null','primary key'] + tests: + - unique + - name: color + data_type: text + - name: date_day + data_type: date +``` + + + +Expected DDL to enforce constraints: + + +```sql + create or replace transient table AD_HOC.dbt_bperigaud.constraints_model + + + ( + + id integer not null primary key , + color text , + date_day date + ) + + + as + ( + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day + ); +``` + + + +
+ +
+ +* PostgreSQL constraints documentation: [here](https://www.postgresql.org/docs/current/ddl-constraints.html#id-1.5.4.6.6) + + + +```sql +{{ + config( + materialized = "table" + ) +}} + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +``` + + + + + +```yml +models: + - name: constraints_example + docs: + node_color: black + config: + constraints_enabled: true + columns: + - name: id + data_type: integer + description: hello + constraints: ['not null','primary key'] + constraints_check: (id > 0) + tests: + - unique + - name: color + data_type: text + - name: date_day + data_type: date +``` + + + +Expected DDL to enforce constraints: + + +```sql + create table "database_name"."schema_name"."constraints_example__dbt_tmp" + + + ( + + + + + id integer not null primary key check (id > 0) , + + + + + color text , + + + + + date_day date + + ) + ; + insert into "database_name"."schema_name"."constraints_example__dbt_tmp" + ( + + + id , + + + color , + + + date_day + + ) + + + ( + + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day + ); +``` + + + +
+ +
+ + \ No newline at end of file diff --git a/website/sidebars.js b/website/sidebars.js index 28172be026a..898d20d8fc1 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -447,6 +447,7 @@ const sidebarSettings = { "reference/resource-configs/database", "reference/resource-configs/enabled", "reference/resource-configs/full_refresh", + "reference/resource-configs/constraints_enabled", "reference/resource-configs/grants", "reference/resource-configs/docs", "reference/resource-configs/persist_docs", From 33ccf31ce2627f917a7eb48d89b539f0a3113b38 Mon Sep 17 00:00:00 2001 From: Jeremy Cohen Date: Thu, 9 Feb 2023 14:03:24 +0100 Subject: [PATCH 02/16] Initialize v1.5 --- website/dbt-versions.js | 5 ++ .../versions/02-upgrading-to-v1.5.md | 50 +++++++++++++++++++ 2 files changed, 55 insertions(+) create mode 100644 website/docs/guides/migration/versions/02-upgrading-to-v1.5.md diff --git a/website/dbt-versions.js b/website/dbt-versions.js index ae2d49e0759..87d892491b6 100644 --- a/website/dbt-versions.js +++ b/website/dbt-versions.js @@ -1,4 +1,9 @@ exports.versions = [ + { + version: "1.5", + EOLDate: "2024-04-26", + isPrerelease: true, + }, { version: "1.4", EOLDate: "2024-01-25", diff --git a/website/docs/guides/migration/versions/02-upgrading-to-v1.5.md b/website/docs/guides/migration/versions/02-upgrading-to-v1.5.md new file mode 100644 index 00000000000..528e0f4a898 --- /dev/null +++ b/website/docs/guides/migration/versions/02-upgrading-to-v1.5.md @@ -0,0 +1,50 @@ +--- +title: "Trying v1.5 (prerelease)" +description: New features and changes in dbt Core v1.5 +--- +### Resources + +- [Changelog](https://github.com/dbt-labs/dbt-core/blob/main/CHANGELOG.md) +- [CLI Installation guide](/docs/get-started/installation) +- [Cloud upgrade guide](/docs/dbt-versions/upgrade-core-in-cloud) +- [Release schedule](https://github.com/dbt-labs/dbt-core/issues/6715) + +**Planned final release:** April 26, 2023 + +dbt Core v1.5 is a feature release, with two big additions planned: +1. **"Models as APIs,"** the first phase of [multi-project deployments](https://github.com/dbt-labs/dbt-core/discussions/6725) +2. An initial **Python API for dbt-core,** supporting programmatic invocations at parity with the CLI + +## What to know before upgrading + +dbt Labs is committed to providing backward compatibility for all versions 1.x, with the exception of any changes explicitly mentioned below. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). + +### Breaking changes + +As part of our refactor of `dbt-core` internals, we need to make some **very precise** changes to runtime configuration. The net result of these changes is more sensible configuration options, clearer documentation, cleaner APIs, and a more legible codebase. + +Wherever possible, we will aim to provide backwards compatibility and deprecation warnings for at least one minor version, before actually removing the old functionality. In those cases, we still reserve the right to fully remove the backward-compatible functionality in a future v1.x minor version of `dbt-core`. + +Changes planned for v1.5: +- Renaming ["global configs"](global-configs) for consistency ([dbt-core#6903](https://github.com/dbt-labs/dbt-core/issues/6903)) +- Moving `log-path` and `target-path` out of `dbt_project.yml`, for consistency with other global configs ([dbt-core#6882](https://github.com/dbt-labs/dbt-core/issues/6882)) + +### For consumers of dbt artifacts (metadata) + +The manifest schema version will be updated to `v9`. Specific changes to be noted here. + +### For maintainers of adapter plugins + +Forthcoming: GH discussion detailing interface changes, and offering a forum for Q&A + +## New and changed documentation + +Forthcoming! + +### "Models as APIs" +- Model contracts ([#2839](https://github.com/dbt-labs/docs.getdbt.com/issues/2839)) +- Model groups & access ([#2840](https://github.com/dbt-labs/docs.getdbt.com/issues/2840)) +- Model versions ([#2841](https://github.com/dbt-labs/docs.getdbt.com/issues/2841)) + +### dbt-core Python API +- Auto-generated documentation ([#2674](https://github.com/dbt-labs/docs.getdbt.com/issues/2674)) for dbt-core CLI & Python API for programmatic invocations From 79c4cfab873bb66d6a45b35319e27644bbaf43e0 Mon Sep 17 00:00:00 2001 From: Jeremy Cohen Date: Mon, 20 Feb 2023 02:02:00 +0100 Subject: [PATCH 03/16] Initial skeleton drafts --- website/dbt-versions.js | 36 ++++++-- .../docs/collaborate/publish/model-access.md | 61 ++++++++++++++ .../collaborate/publish/model-contracts.md | 84 +++++++++++++++++++ .../collaborate/publish/model-versions.md | 28 +++++++ .../{constraints_enabled.md => contract.md} | 14 +++- .../resource-properties/constraints.md | 6 ++ website/sidebars.js | 11 ++- 7 files changed, 228 insertions(+), 12 deletions(-) create mode 100644 website/docs/docs/collaborate/publish/model-access.md create mode 100644 website/docs/docs/collaborate/publish/model-contracts.md create mode 100644 website/docs/docs/collaborate/publish/model-versions.md rename website/docs/reference/resource-configs/{constraints_enabled.md => contract.md} (97%) create mode 100644 website/docs/reference/resource-properties/constraints.md diff --git a/website/dbt-versions.js b/website/dbt-versions.js index 87d892491b6..f820deeace4 100644 --- a/website/dbt-versions.js +++ b/website/dbt-versions.js @@ -27,6 +27,34 @@ exports.versions = [ ] exports.versionedPages = [ + { + "page": "docs/collaborate/publish/model-contracts", + "firstVersion": "1.5", + }, + { + "page": "docs/collaborate/publish/model-access", + "firstVersion": "1.5", + }, + { + "page": "docs/collaborate/publish/model-versions", + "firstVersion": "1.5", + }, + { + "page": "reference/resource-configs/contract", + "firstVersion": "1.5", + }, + { + "page": "reference/resource-properties/constraints", + "firstVersion": "1.5", + }, + { + "page": "reference/dbt-jinja-functions/local-md5", + "firstVersion": "1.4", + }, + { + "page": "reference/warehouse-setups/fal-setup", + "firstVersion": "1.3", + }, { "page": "reference/dbt-jinja-functions/set", "firstVersion": "1.2", @@ -55,12 +83,4 @@ exports.versionedPages = [ "page": "reference/dbt-jinja-functions/print", "firstVersion": "1.1", }, - { - "page": "reference/dbt-jinja-functions/local-md5", - "firstVersion": "1.4", - }, - { - "page": "reference/warehouse-setups/fal-setup", - "firstVersion": "1.3", - }, ] diff --git a/website/docs/docs/collaborate/publish/model-access.md b/website/docs/docs/collaborate/publish/model-access.md new file mode 100644 index 00000000000..4cfdd5eee4f --- /dev/null +++ b/website/docs/docs/collaborate/publish/model-access.md @@ -0,0 +1,61 @@ +--- +title: "Model access" +--- + +:::info Beta functionality +This functionality is new in v1.5! These docs exist to provide a high-level overview of what's to come. Specific syntax is liable to change. + +For more details, and to leave your feedback, check out the GitHub discussion: +* ["Model groups & access" (dbt-core#6730)](https://github.com/dbt-labs/dbt-core/discussions/6730) +::: + +## Related documentation +* TK: `groups` +* TK: `access` modifiers + +### Groups + +Models can be grouped together under a common designation, with a shared owner. + +Why define model `groups`? +- It turns implicit relationships into an explicit grouping +- It enables you to mark certain models as "private," for use _only_ within that group + +### Access modifiers + +Some models (not all of them) are designed to be shared across groups. + +https://en.wikipedia.org/wiki/Access_modifiers + +| Keyword | Meaning | +|-----------|----------------------| +| private | same group | +| protected | same project/package | +| public | everybody* | + +By default, all models are "protected." This means that they can be referenced by other models in the same project. + +:::info Under construction 🚧 +More to come! The syntax below is suggestive only, it does not yet work. +::: + + + +```yaml +groups: + - name: cx + owner: + name: Customer Success Team + email: cx@jaffle.shop + +models: + - name: dim_customers + group: cx + access: public + # this is an intermediate transformation -- relevant to the CX team only + - name: int__customer_history_rollup + group: cx + access: private +``` + + diff --git a/website/docs/docs/collaborate/publish/model-contracts.md b/website/docs/docs/collaborate/publish/model-contracts.md new file mode 100644 index 00000000000..61f7894cb8a --- /dev/null +++ b/website/docs/docs/collaborate/publish/model-contracts.md @@ -0,0 +1,84 @@ +--- +title: "Model contracts" +--- + +:::info Beta functionality +This functionality is new in v1.5! These docs exist to provide a high-level overview of what's to come. Specific syntax is liable to change. + +For more details, and to leave your feedback, check out the GitHub discussion: +* ["Model contracts" (dbt-core#6726)](https://github.com/dbt-labs/dbt-core/discussions/6726) +::: + +## Related documentation +* [`contract`](resource-configs/contract) +* [`columns`](resource-properties/columns) +* [`constraints`](resource-properties/constraints) + +## Why define a contract? + +Defining a dbt model is as easy as writing a SQL `select` statement, or a Python Data Frame transformation. Your query naturally produces a dataset with columns of names and types, based on the columns you're selecting and the transformations you're applying. + +While this is great for quick & iterative development, for some models, constantly changing the shape of the model's returned dataset poses a risk, when there are other people and processes querying that model. It's better to define a set of **upfront guarantees** about the shape of your model. We call this set of guarantees a "contract." While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract—or it will fail to build. + +## How to define a contract + +Let's say you have a model with a query like: + + + +```sql +-- lots of SQL + +final as ( + + select + -- lots of columns + from ... + +) + +select * from final +``` + + +Your contract **must** include every column's `name` and `data_type` (where `data_type` matches the type understood by your data platform). If your model is being materialized as `table` or `incremental`, you may optionally specify that certain columns must be `not_null` (i.e. contain zero null values). Depending on your data platform, you may also be able to define additional `constraints` that are enforced while the model is being built. + +Finally, you configure your model with `contract: true`. + + + +```yaml +models: + - name: dim_customers + config: + contract: true + columns: + - name: customer_id + data_type: int + not_null: true + - name: customer_name + data_type: string + ... +``` + + + +When building a model with a defined contract, dbt will do two things differently: +1. dbt will run a "pre-flight" check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined. +2. dbt will pass the column names, types, `not_null` and other constraints into the DDL statements it submits to the data platform, where they will be enforced while building the table. + +## FAQs + +### Which models should have contracts? + +Any model can define a contract. It's especially important to define contracts for "public" models that are being shared with other groups, teams, and (soon) dbt projects. + +### How are contracts different from tests? + +A model's contract defines the **shape** of the returned dataset. + +[Tests](tests) are a more flexible mechanism for validating the content of your model. So long as you can write the query, you can run the test. In blue/green deployments (docs link TK), ... + +In the parallel for software APIs: +- The structure of the API response is the contract +- Quality and reliability ("uptime") are also **crucial**, but not part of the contract per se. diff --git a/website/docs/docs/collaborate/publish/model-versions.md b/website/docs/docs/collaborate/publish/model-versions.md new file mode 100644 index 00000000000..aa97d90eaf6 --- /dev/null +++ b/website/docs/docs/collaborate/publish/model-versions.md @@ -0,0 +1,28 @@ +--- +title: "Model versions" +--- + +:::info Beta functionality +This functionality is new in v1.5! These docs exist to provide a high-level overview of what's to come. Specific syntax is liable to change. + +For more details, and to leave your feedback, check out the GitHub discussion: +* ["Model versions" (dbt-core#6736)](https://github.com/dbt-labs/dbt-core/discussions/6736) +::: + +API versioning is **a hard problem** in software engineering. It's also very important. Our goal is to _make a hard thing possible_. + +## Related documentation +* TK: `version` & `latest` (_not_ [this one](project-configs/version)) +* TK: `deprecation_date` + +## Why version a model? + +If a model defines a ["contract"](model-contracts) (a set of guarantees for its structure), it's also possible to change that model's contract in a way that "breaks" the previous set of guarantees. + +One approach is to force every consumer of the model to immediately handle the breaking change, as soon as it's deployed to production. While this may work at smaller organizations, or while iterating on an immature set of data models, it doesn't scale much beyond that. + +Instead, the owner of the model can create a **new version**, and provide a **deprecation window**, during which consumers can migrate from the old version to the new. + +In the meantime, anywhere that model is used downstream, it can be referenced at a specific version. + +When a model is reaching its deprecation date, consumers of that model will hear about it. When the date is reached, it goes away. diff --git a/website/docs/reference/resource-configs/constraints_enabled.md b/website/docs/reference/resource-configs/contract.md similarity index 97% rename from website/docs/reference/resource-configs/constraints_enabled.md rename to website/docs/reference/resource-configs/contract.md index c7741eddbc7..e439a644bf0 100644 --- a/website/docs/reference/resource-configs/constraints_enabled.md +++ b/website/docs/reference/resource-configs/contract.md @@ -1,10 +1,18 @@ --- resource_types: [models] datatype: "{}" -default_value: {constraints_enabled: false} -id: "constraints_enabled" +default_value: {contract: false} +id: "contract" --- - + +:::info Beta functionality +This functionality is new in v1.5! These docs exist to provide a high-level overview of what's to come. Specific syntax is liable to change. +::: + +## Related documentation +- [Model contracts](publish/model-contracts) + + # Definition diff --git a/website/docs/reference/resource-properties/constraints.md b/website/docs/reference/resource-properties/constraints.md new file mode 100644 index 00000000000..6f1f66b5524 --- /dev/null +++ b/website/docs/reference/resource-properties/constraints.md @@ -0,0 +1,6 @@ +--- +resource_types: [models] +datatype: "{dictionary}" +--- + +https://github.com/dbt-labs/dbt-core/issues/6750 diff --git a/website/sidebars.js b/website/sidebars.js index 898d20d8fc1..6de0086090a 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -317,6 +317,15 @@ const sidebarSettings = { "docs/collaborate/manage-access/audit-log", ], }, // Manage access + { + type: "category", + label: "Publishing models", + items: [ + "docs/collaborate/publish/model-contracts", + "docs/collaborate/publish/model-access", + "docs/collaborate/publish/model-versions", + ], + }, // publishing models ], }, { @@ -447,7 +456,7 @@ const sidebarSettings = { "reference/resource-configs/database", "reference/resource-configs/enabled", "reference/resource-configs/full_refresh", - "reference/resource-configs/constraints_enabled", + "reference/resource-configs/contract", "reference/resource-configs/grants", "reference/resource-configs/docs", "reference/resource-configs/persist_docs", From 6fe3771db0efab6766f85f72f9d6743377498441 Mon Sep 17 00:00:00 2001 From: Jeremy Cohen Date: Mon, 20 Feb 2023 11:25:33 +0100 Subject: [PATCH 04/16] Write a bit more --- .../collaborate/publish/model-contracts.md | 4 +- .../versions/02-upgrading-to-v1.5.md | 17 +- .../reference/resource-configs/contract.md | 530 +----------------- .../resource-properties/constraints.md | 399 ++++++++++++- website/sidebars.js | 1 + 5 files changed, 445 insertions(+), 506 deletions(-) diff --git a/website/docs/docs/collaborate/publish/model-contracts.md b/website/docs/docs/collaborate/publish/model-contracts.md index 61f7894cb8a..f2c2529b14e 100644 --- a/website/docs/docs/collaborate/publish/model-contracts.md +++ b/website/docs/docs/collaborate/publish/model-contracts.md @@ -77,7 +77,9 @@ Any model can define a contract. It's especially important to define contracts f A model's contract defines the **shape** of the returned dataset. -[Tests](tests) are a more flexible mechanism for validating the content of your model. So long as you can write the query, you can run the test. In blue/green deployments (docs link TK), ... +[Tests](tests) are a more flexible mechanism for validating the content of your model. So long as you can write the query, you can run the test. Tests are also more configurable, via `severity` and custom thresholds, and easier to debug after finding failures, because the model has already built, and the relevant records can be materialized in the data warehouse by [storing failures](resource-configs/store_failures). + +In blue/green deployments (docs link TK), ... In the parallel for software APIs: - The structure of the API response is the contract diff --git a/website/docs/guides/migration/versions/02-upgrading-to-v1.5.md b/website/docs/guides/migration/versions/02-upgrading-to-v1.5.md index 528e0f4a898..c3ca745d509 100644 --- a/website/docs/guides/migration/versions/02-upgrading-to-v1.5.md +++ b/website/docs/guides/migration/versions/02-upgrading-to-v1.5.md @@ -2,6 +2,11 @@ title: "Trying v1.5 (prerelease)" description: New features and changes in dbt Core v1.5 --- + +:::info +v1.5 is currently available as a **beta prerelease.** Availability in dbt Cloud coming soon! +::: + ### Resources - [Changelog](https://github.com/dbt-labs/dbt-core/blob/main/CHANGELOG.md) @@ -39,12 +44,14 @@ Forthcoming: GH discussion detailing interface changes, and offering a forum for ## New and changed documentation -Forthcoming! +:::caution Under construction 🚧 +More to come! +::: -### "Models as APIs" -- Model contracts ([#2839](https://github.com/dbt-labs/docs.getdbt.com/issues/2839)) -- Model groups & access ([#2840](https://github.com/dbt-labs/docs.getdbt.com/issues/2840)) -- Model versions ([#2841](https://github.com/dbt-labs/docs.getdbt.com/issues/2841)) +### Publishing models as APIs +- [Model contracts](model-contracts) ([#2839](https://github.com/dbt-labs/docs.getdbt.com/issues/2839)) +- [Model access](model-access) ([#2840](https://github.com/dbt-labs/docs.getdbt.com/issues/2840)) +- [Model versions](model-versions) ([#2841](https://github.com/dbt-labs/docs.getdbt.com/issues/2841)) ### dbt-core Python API - Auto-generated documentation ([#2674](https://github.com/dbt-labs/docs.getdbt.com/issues/2674)) for dbt-core CLI & Python API for programmatic invocations diff --git a/website/docs/reference/resource-configs/contract.md b/website/docs/reference/resource-configs/contract.md index e439a644bf0..023406a3b8a 100644 --- a/website/docs/reference/resource-configs/contract.md +++ b/website/docs/reference/resource-configs/contract.md @@ -5,17 +5,32 @@ default_value: {contract: false} id: "contract" --- -:::info Beta functionality -This functionality is new in v1.5! These docs exist to provide a high-level overview of what's to come. Specific syntax is liable to change. -::: ## Related documentation -- [Model contracts](publish/model-contracts) +- [What is a model contract?](publish/model-contracts) +- [Defining `columns`](resource-properties/columns) +- [Defining `constraints`](resource-properties/constraints) +:::info Beta functionality +This functionality is new in v1.5! These docs exist to provide a high-level overview of what's to come. Specific syntax is liable to change. + +In particular: +- The current name of the `contract` config is `constraints_enabled`. +- "Pre flight" check includes column `name` only, and is order-sensitive. We aim to add `data_type` and make it insensitive to column order. +::: + # Definition +When the `contract` configuration is enabled, dbt will ensure that your model's returned dataset exactly matches the attributes you have defined in yaml: +- `name` and `data_type` for every column +- additional [`constraints`](resource-properties/constraints), as supported for this materialization + data platform + +:::caution Under construction 🚧 +More to come! +::: + You can manage data type constraints on your models using the `constraints_enabled` configuration. This configuration is available on all models, and is disabled by default. When enabled, dbt will automatically add constraints to your models based on the data types of the columns in your model's schema. This is a great way to ensure your data is always in the correct format. For example, if you have a column in your model that is defined as a `date` data type, dbt will automatically add a data type constraint to that column to ensure the data in that column is always a valid date. If you want to add a `not null` constraint to a column in a preventative manner rather than as a test, you can add the `not null` value to the column definition in your model's schema: `constraints: ['not null']`. ## When to use constraints vs. tests @@ -28,87 +43,41 @@ Constraints are great when you define `constraints: ['not null']` for a column i Tests should be used in addition to and instead of constraints when you want to test things like `accepted_values` and `relationships`. These are usually not enforced with built-in database functionality and are not possible with constraints. Also, custom tests will allow more flexibility and address nuanced data quality issues that may not be possible with constraints. -## Configuring Constraints +## Current Limitations -You can configure `constraints_enabled` in `schema.yml` files to apply constraints one-by-one for specific dbt models in `yml` config blocks. You'll receive dynamic error messages if you do not configure constraints based on the criteria below. - -- Constraints must be defined in a `yml` schema configuration file like `schema.yml`. - -- Only the `SQL` **table** materialization is supported for constraints. - -```txt -Parsing Error - Original File Path: (models/constraints_examples/constraints_example.sql) - Constraints must be defined in a `yml` schema configuration file like `schema.yml`. - Only the SQL table materialization is supported for constraints. - `data_type` values must be defined for all columns and NOT be null or blank. - Materialization Error: {'materialization': 'snapshot'} -``` - -- `data_type` values must be defined for all columns and NOT be null or blank. - -```txt -Parsing Error - Original File Path: (models/constraints_examples/constraints_example.sql) - Constraints must be defined in a `yml` schema configuration file like `schema.yml`. - Only the SQL table materialization is supported for constraints. - `data_type` values must be defined for all columns and NOT be null or blank. - Columns with `data_type` Blank/Null Errors: {'id'} -``` - -- `constraints_enabled=true` can only be configured within `schema.yml` files NOT within a model file(ex: .sql, .py) or `dbt_project.yml`. *(Note: Current parsing mechanics require all constraints configs be written in schema files to be implemented exactly as configured. This may change in the future.)* - -```txt -Parsing Error - Original File Path: (models/constraints_examples/constraints_example.sql) - Constraints must be defined in a `yml` schema configuration file like `schema.yml`. - Only the SQL table materialization is supported for constraints. - `data_type` values must be defined for all columns and NOT be null or blank. - `constraints_enabled=true` can only be configured within `schema.yml` files - NOT within a model file(ex: .sql, .py) or `dbt_project.yml`. -``` - -- Please ensure the name, order, and number of columns in your `yml` file match the columns in your SQL file. +- `contract` (a.k.a. `constraints_enabled`) must be configured in the yaml [`config`] property _only_. Setting this configuration via in-file config or in `dbt_project.yml` is not supported. +- `contract` (a.k.a. `constraints_enabled`) is supported only for a SQL model materialized as `table`. +- "Pre flight" checks include the column `name`, but not yet their `data_type`. It is our intent to support `data_type` verification in a forthcoming beta prerelease. +- The order of columns in your `yml` file must match exactly the order of columns as returned by your model's SQL query. +- While most data platforms support `not_null` checks, support for [additional `constraints`](resource-properties/constraints) varies by data platform. ```txt # example error message Compilation Error in model constraints_example (models/constraints_examples/constraints_example.sql) Please ensure the name, order, and number of columns in your `yml` file match the columns in your SQL file. - Schema File Columns: ['ID', 'COLOR', 'DATE_DAY'] - SQL File Columns: ['ERROR', 'COLOR', 'DATE_DAY'] + Schema File Columns: ['id', 'date_day', 'color'] + SQL File Columns: ['id', 'color', 'date_day'] ``` -> Note: Constraints and data type inheritance across downstream tables depends on database-specific functionality. We recommend defining constraints for all tables in scope where desired. -> Constraints can be defined as a list or in bullet form. Both are valid. - - - - +## Example ```yml models: - name: constraints_example - docs: - node_color: black config: constraints_enabled: true columns: - name: id data_type: integer description: hello - constraints: ['not null','primary key'] + constraints: ['not null', 'primary key'] constraints_check: (id > 0) tests: - unique - name: color - constraints: + constraints: - not null - primary key data_type: string @@ -116,441 +85,4 @@ models: data_type: date ``` - - -The `constraints_enabled` config can also be defined: - -- under the `models` config block in `dbt_project.yml` only for `false` configs - - - -```yml -models: - tpch: - staging: - +materialized: view - +docs: - node_color: "#cd7f32" - - marts: - core: - +constraints_enabled: false # enforce data type constraints across all models in the "/marts/core" subfolder - materialized: table -``` - - - -- in a `config()` Jinja macro within a model's SQL file only for `false` configs - - - -```sql -{{ - config( - materialized = "table", - constraints_enabled = false - ) -}} - -select - 1 as id, - 'blue' as color, - cast('2019-01-01' as date) as date_day -``` - - - -See [configs and properties](configs-and-properties) for details. - - - - -### Database-specific Examples and Notes - - - -
- -On BigQuery, "privileges" are called "roles," and they take the form `roles/service.roleName`. For instance, instead of granting `select` on a model, you would grant `roles/bigquery.dataViewer`. - -Grantees can be users, groups, service accounts, domains—and each needs to be clearly demarcated as such with a prefix. For instance, to grant access on a model to `someone@yourcompany.com`, you need to specify them as `user:someone@yourcompany.com`. - -We encourage you to read Google's documentation for more context: -- [Understanding GCP roles](https://cloud.google.com/iam/docs/understanding-roles) -- [How to format grantees](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-control-language#user_list) - - - -### BigQuery examples - -Granting permission using SQL and BigQuery: - -```sql -{{ config(grants = {'roles/bigquery.dataViewer': ['user:someone@yourcompany.com']}) }} -``` - -Granting permission in a model schema using BigQuery: - - - -```yml -models: - - name: specific_model - config: - grants: - roles/bigquery.dataViewer: ['user:someone@yourcompany.com'] -``` - - - -
- -
- -Spark allows you to define: - -- a `not null` constraint -- and/or additional constraint checks on your columns - -As Spark does not support transactions nor allows using `create or replace table` with a schema, the table is first created without a schema and `alter` statements are then executed to add the different constraints. - -This means that: - -- the names and order of columns is checked but not their type -- if the `constraints` and/or `constraint_check` fails, the table with the failing data will still exist in the Warehouse - -See [this page](https://docs.databricks.com/tables/constraints.html) with more details about the support of constraints on Spark. - - - -```sql -{{ - config( - materialized = "table" - ) -}} - -select - 1 as id, - 'blue' as color, - cast('2019-01-01' as date) as date_day -``` - - - - - -```yml -models: - - name: constraints_example - docs: - node_color: black - config: - constraints_enabled: true - columns: - - name: id - data_type: integer - description: hello - constraints: ['not null'] - constraints_check: "(id > 0)" - tests: - - unique - - name: color - data_type: text - - name: date_day - data_type: date -``` - - - -Expected DDL to enforce constraints: - - -```sql - create or replace table schema_name.my_model - using delta - as - select - 1 as id, - 'blue' as color, - cast('2019-01-01' as date) as date_day -``` - - - -Followed by the statements - -```sql -alter table schema_name.my_model change column id set not null; -alter table schema_name.my_model add constraint 472394792387497234 check (id > 0); -``` - -
- -
- -Redshift currently only enforces `not null` constraints; all other constraints are metadata only. Additionally, Redshift does not allow column checks at the time of table creation. See more in the Redshift documentation [here](https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html). - - - -```sql -{{ - config( - materialized = "table" - ) -}} - -select - 1 as id, - 'blue' as color, - cast('2019-01-01' as date) as date_day -``` - - - - - -```yml -models: - - name: constraints_example - docs: - node_color: black - config: - constraints_enabled: true - columns: - - name: id - data_type: integer - description: hello - constraints: ['not null','primary key'] - constraints_check: (id > 0) - tests: - - unique - - name: color - data_type: varchar - - name: date_day - data_type: date -``` - - - -Expected DDL to enforce constraints: - - -```sql - -create table - "database_name"."schema_name"."my_model__dbt_tmp" - - ( - id integer not null, - color varchar, - date_day date, - primary key(id) - ) - - - - ; - insert into "database_name"."schema_name"."my_model__dbt_tmp" - ( - -select - 1 as id, - 'blue' as color, - cast('2019-01-01' as date) as date_day - ) - ; - -``` - - - - -
- -
- -- Snowflake constraints documentation: [here](https://docs.snowflake.com/en/sql-reference/constraints-overview.html) -- Snowflake data types: [here](https://docs.snowflake.com/en/sql-reference/intro-summary-data-types.html) - -Snowflake suppports four types of constraints: `unique`, `not null`, `primary key` and `foreign key`. - -It is important to note that only the `not null` (and the `not null` property of `primary key`) are actually checked today. -There rest of the constraints are purely metadata, not verified when inserting data. - -Currently, Snowflake doesn't support the `check` syntax and dbt will skip the `check` config and raise a warning message if it is set on some models in the dbt project. - - - -```sql -{{ - config( - materialized = "table" - ) -}} - -select - 1 as id, - 'blue' as color, - cast('2019-01-01' as date) as date_day -``` - - - - - -```yml -models: - - name: constraints_example - docs: - node_color: black - config: - constraints_enabled: true - columns: - - name: id - data_type: integer - description: hello - constraints: ['not null','primary key'] - tests: - - unique - - name: color - data_type: text - - name: date_day - data_type: date -``` - - - -Expected DDL to enforce constraints: - - -```sql - create or replace transient table AD_HOC.dbt_bperigaud.constraints_model - - - ( - - id integer not null primary key , - color text , - date_day date - ) - - - as - ( - -select - 1 as id, - 'blue' as color, - cast('2019-01-01' as date) as date_day - ); -``` - - - -
- -
- -* PostgreSQL constraints documentation: [here](https://www.postgresql.org/docs/current/ddl-constraints.html#id-1.5.4.6.6) - - - -```sql -{{ - config( - materialized = "table" - ) -}} - -select - 1 as id, - 'blue' as color, - cast('2019-01-01' as date) as date_day -``` - - - - - -```yml -models: - - name: constraints_example - docs: - node_color: black - config: - constraints_enabled: true - columns: - - name: id - data_type: integer - description: hello - constraints: ['not null','primary key'] - constraints_check: (id > 0) - tests: - - unique - - name: color - data_type: text - - name: date_day - data_type: date -``` - - - -Expected DDL to enforce constraints: - - -```sql - create table "database_name"."schema_name"."constraints_example__dbt_tmp" - - - ( - - - - - id integer not null primary key check (id > 0) , - - - - - color text , - - - - - date_day date - - ) - ; - insert into "database_name"."schema_name"."constraints_example__dbt_tmp" - ( - - - id , - - - color , - - - date_day - - ) - - - ( - - -select - 1 as id, - 'blue' as color, - cast('2019-01-01' as date) as date_day - ); -``` - - - -
- -
- - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/reference/resource-properties/constraints.md b/website/docs/reference/resource-properties/constraints.md index 6f1f66b5524..4e0d09fc1e9 100644 --- a/website/docs/reference/resource-properties/constraints.md +++ b/website/docs/reference/resource-properties/constraints.md @@ -3,4 +3,401 @@ resource_types: [models] datatype: "{dictionary}" --- -https://github.com/dbt-labs/dbt-core/issues/6750 +:::caution Under construction 🚧 +These docs are liable to change! +::: + +:::info Beta functionality +This functionality is new in v1.5! These docs exist to provide a high-level overview of what's to come. Specific syntax is liable to change. + +For more details, and to leave your feedback, check out this GitHub issue: +* ["Unify constraints and constraints_check configs" (dbt-core#6750)](https://github.com/dbt-labs/dbt-core/issues/6750) +::: + +In transactional databases, it is possible to define "constraints" on the allowed values of certain columns, stricter than just the data type of those values. Because Postgres is a transactional database, it supports and enforces all the constraints in the ANSI SQL standard (`not null`, `unique`, `primary key`, `foreign key`), plus a flexible row-level `check` constraint that evaluates to a boolean expression. + +Most analytical data platforms support and enforce a `not null` constraint, but they either do not support or do not enforce the rest. It is sometimes still desirable to add an "informational" constraint, knowing it is _not_ enforced, for the purpose of integrating with legacy data catalog or entity-relation diagram tools ([dbt-core#3295](https://github.com/dbt-labs/dbt-core/issues/3295)). + + + +
+ +* PostgreSQL constraints documentation: [here](https://www.postgresql.org/docs/current/ddl-constraints.html#id-1.5.4.6.6) + + + +```sql +{{ + config( + materialized = "table" + ) +}} + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +``` + + + + + +```yml +models: + - name: constraints_example + docs: + node_color: black + config: + constraints_enabled: true + columns: + - name: id + data_type: integer + description: hello + constraints: ['not null','primary key'] + constraints_check: (id > 0) + tests: + - unique + - name: color + data_type: text + - name: date_day + data_type: date +``` + + + +Expected DDL to enforce constraints: + + +```sql +create table "database_name"."schema_name"."constraints_example__dbt_tmp" +( + id integer not null primary key check (id > 0), + color text, + date_day date +) +; +insert into "database_name"."schema_name"."constraints_example__dbt_tmp" +( + id, + color, + date_day +) +( +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +); +``` + + + +
+ +
+ +Redshift currently only enforces `not null` constraints; all other constraints are metadata only. Additionally, Redshift does not allow column checks at the time of table creation. See more in the Redshift documentation [here](https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html). + + + +```sql +{{ + config( + materialized = "table" + ) +}} + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +``` + + + + + +```yml +models: + - name: constraints_example + docs: + node_color: black + config: + constraints_enabled: true + columns: + - name: id + data_type: integer + description: hello + constraints: ['not null','primary key'] + constraints_check: (id > 0) + tests: + - unique + - name: color + data_type: varchar + - name: date_day + data_type: date +``` + + + +Expected DDL to enforce constraints: + + +```sql + +create table "database_name"."schema_name"."constraints_example__dbt_tmp" + +( + id integer not null, + color varchar, + date_day date, + primary key(id) +) +; + +insert into "database_name"."schema_name"."constraints_example__dbt_tmp" +( +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +); +``` + + + + +
+ +
+ +- Snowflake constraints documentation: [here](https://docs.snowflake.com/en/sql-reference/constraints-overview.html) +- Snowflake data types: [here](https://docs.snowflake.com/en/sql-reference/intro-summary-data-types.html) + +Snowflake suppports four types of constraints: `unique`, `not null`, `primary key` and `foreign key`. + +It is important to note that only the `not null` (and the `not null` property of `primary key`) are actually checked today. +There rest of the constraints are purely metadata, not verified when inserting data. + +Currently, Snowflake doesn't support the `check` syntax and dbt will skip the `check` config and raise a warning message if it is set on some models in the dbt project. + + + +```sql +{{ + config( + materialized = "table" + ) +}} + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +``` + + + + + +```yml +models: + - name: constraints_example + docs: + node_color: black + config: + constraints_enabled: true + columns: + - name: id + data_type: integer + description: hello + constraints: ['not null','primary key'] + tests: + - unique + - name: color + data_type: text + - name: date_day + data_type: date +``` + + + +Expected DDL to enforce constraints: + + +```sql +create or replace transient table ..constraints_model +( + id integer not null primary key, + color text, + date_day date +) +as +( +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +); +``` + + + +
+ +
+ +BigQuery allows defining `not null` constraints. However, it does _not_ support or enforce the definition of unenforced constraints, such as `primary key`. + +Documentation: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language + +Data types: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types + + + +```sql +{{ + config( + materialized = "table" + ) +}} + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +``` + + + + + +```yml +models: + - name: constraints_example + docs: + node_color: black + config: + constraints_enabled: true + columns: + - name: id + data_type: integer + description: hello + constraints: ['not null'] # 'primary key' is not supported + tests: + - unique + - name: color + data_type: string + - name: date_day + data_type: date +``` + + + +Expected DDL to enforce constraints: + + +```sql +create or replace table ``.``.`constraints_model` +( + id integer not null, + color string, + date_day date +) +as +( +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +); +``` + + + +
+ +
+ +Databricks allows you to define: + +- a `not null` constraint +- and/or additional `check` constraints, with conditional expressions including one or more columns + +As Databricks does not support transactions nor allows using `create or replace table` with a column schema, the table is first created without a schema and `alter` statements are then executed to add the different constraints. + +This means that: + +- The names and order of columns is checked but not their type +- If the `constraints` and/or `constraint_check` fails, the table with the failing data will still exist in the Warehouse + +See [this page](https://docs.databricks.com/tables/constraints.html) with more details about the support of constraints on Databricks. + + + +```sql +{{ + config( + materialized = "table" + ) +}} + +select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +``` + + + + + +```yml +models: + - name: constraints_example + docs: + node_color: black + config: + constraints_enabled: true + columns: + - name: id + data_type: integer + description: hello + constraints: ['not null'] + constraints_check: "(id > 0)" + tests: + - unique + - name: color + data_type: text + - name: date_day + data_type: date +``` + + + +Expected DDL to enforce constraints: + + +```sql + create or replace table schema_name.my_model + using delta + as + select + 1 as id, + 'blue' as color, + cast('2019-01-01' as date) as date_day +``` + + + +Followed by the statements + +```sql +alter table schema_name.my_model change column id set not null; +alter table schema_name.my_model add constraint 472394792387497234 check (id > 0); +``` + +
+ +
\ No newline at end of file diff --git a/website/sidebars.js b/website/sidebars.js index 6de0086090a..36108a209a7 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -443,6 +443,7 @@ const sidebarSettings = { items: [ "reference/resource-properties/columns", "reference/resource-properties/config", + "reference/resource-properties/constraints", "reference/resource-properties/description", "reference/resource-properties/quote", "reference/resource-properties/tests", From 8411c758efc82301b0bb9184f8195dec8cc9378b Mon Sep 17 00:00:00 2001 From: Jeremy Cohen Date: Mon, 20 Feb 2023 11:58:13 +0100 Subject: [PATCH 05/16] Fix broken links --- website/docs/reference/model-configs.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/reference/model-configs.md b/website/docs/reference/model-configs.md index 3f4702e4c01..36dcf129da1 100644 --- a/website/docs/reference/model-configs.md +++ b/website/docs/reference/model-configs.md @@ -105,7 +105,7 @@ models: [+](plus-prefix)[full_refresh](full_refresh): [+](plus-prefix)[meta](meta): {} [+](plus-prefix)[grants](grants): {} - [+](plus-prefix)[constraints_enabled](constraints_enabled): true | false + [+](plus-prefix)[contract](contract): true | false ``` @@ -135,7 +135,7 @@ models: [full_refresh](full_refresh): [meta](meta): {} [grants](grants): {} - [constraints_enabled](constraints_enabled): true | false + [contract](contract): true | false ``` @@ -161,7 +161,7 @@ models: [persist_docs](persist_docs)={}, [meta](meta)={}, [grants](grants)={}, - [constraints_enabled](constraints_enabled)=true | false + [contract](contract)=true | false ) }} ``` From 38b042714774f20598344d09f21b15263a18c40f Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Mon, 20 Feb 2023 11:48:43 -0500 Subject: [PATCH 06/16] Update model-access.md --- .../docs/collaborate/publish/model-access.md | 25 +++++++++++-------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/website/docs/docs/collaborate/publish/model-access.md b/website/docs/docs/collaborate/publish/model-access.md index 4cfdd5eee4f..89900b32b9b 100644 --- a/website/docs/docs/collaborate/publish/model-access.md +++ b/website/docs/docs/collaborate/publish/model-access.md @@ -1,29 +1,32 @@ --- title: "Model access" +id: model-access +sidebar_label: "Model access" +description: "Define model access with group capabilities" --- :::info Beta functionality -This functionality is new in v1.5! These docs exist to provide a high-level overview of what's to come. Specific syntax is liable to change. +This functionality is new in v1.5. These docs exist to provide a high-level overview of what's to come. The specific syntax is liable to change. -For more details, and to leave your feedback, check out the GitHub discussion: -* ["Model groups & access" (dbt-core#6730)](https://github.com/dbt-labs/dbt-core/discussions/6730) +For more details and to leave your feedback, join the GitHub discussion: +["Model groups & access" (dbt-core#6730)](https://github.com/dbt-labs/dbt-core/discussions/6730) ::: ## Related documentation -* TK: `groups` -* TK: `access` modifiers +* Coming soon: `groups` +* Coming soon: `access` modifiers ### Groups -Models can be grouped together under a common designation, with a shared owner. +Models can be grouped under a common designation with a shared owner. Why define model `groups`? - It turns implicit relationships into an explicit grouping -- It enables you to mark certain models as "private," for use _only_ within that group +- It enables you to mark specific models as "private" for use _only_ within that group ### Access modifiers -Some models (not all of them) are designed to be shared across groups. +Some models (not all) are designed to be shared across groups. https://en.wikipedia.org/wiki/Access_modifiers @@ -31,12 +34,12 @@ https://en.wikipedia.org/wiki/Access_modifiers |-----------|----------------------| | private | same group | | protected | same project/package | -| public | everybody* | +| public | everybody | -By default, all models are "protected." This means that they can be referenced by other models in the same project. +By default, all models are "protected." This means that other models in the same project can reference them. :::info Under construction 🚧 -More to come! The syntax below is suggestive only, it does not yet work. +The following syntax is currently under review and does not work. ::: From 32e3ebee59158324d26e7b13608107b53e05e164 Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Mon, 20 Feb 2023 13:36:51 -0500 Subject: [PATCH 07/16] Update model-contracts.md --- .../collaborate/publish/model-contracts.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/website/docs/docs/collaborate/publish/model-contracts.md b/website/docs/docs/collaborate/publish/model-contracts.md index f2c2529b14e..7635f0f8bcf 100644 --- a/website/docs/docs/collaborate/publish/model-contracts.md +++ b/website/docs/docs/collaborate/publish/model-contracts.md @@ -3,9 +3,9 @@ title: "Model contracts" --- :::info Beta functionality -This functionality is new in v1.5! These docs exist to provide a high-level overview of what's to come. Specific syntax is liable to change. +This functionality is new in v1.5. These docs provide a high-level overview of what's to come. The specific syntax is liable to change. -For more details, and to leave your feedback, check out the GitHub discussion: +For more details and to leave your feedback, join the GitHub discussion: * ["Model contracts" (dbt-core#6726)](https://github.com/dbt-labs/dbt-core/discussions/6726) ::: @@ -16,9 +16,9 @@ For more details, and to leave your feedback, check out the GitHub discussion: ## Why define a contract? -Defining a dbt model is as easy as writing a SQL `select` statement, or a Python Data Frame transformation. Your query naturally produces a dataset with columns of names and types, based on the columns you're selecting and the transformations you're applying. +Defining a dbt model is as easy as writing a SQL `select` statement or a Python Data Frame transformation. Your query naturally produces a dataset with columns of names and types based on the columns you select and the transformations you apply. -While this is great for quick & iterative development, for some models, constantly changing the shape of the model's returned dataset poses a risk, when there are other people and processes querying that model. It's better to define a set of **upfront guarantees** about the shape of your model. We call this set of guarantees a "contract." While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract—or it will fail to build. +While this is ideal for quick and iterative development, for some models, constantly changing the shape of its returned dataset poses a risk when other people and processes are querying that model. It's better to define a set of upfront parameters about the shape of your model. We call this set of parameters a "contract." While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract, or it will fail to build. ## How to define a contract @@ -41,7 +41,7 @@ select * from final ``` -Your contract **must** include every column's `name` and `data_type` (where `data_type` matches the type understood by your data platform). If your model is being materialized as `table` or `incremental`, you may optionally specify that certain columns must be `not_null` (i.e. contain zero null values). Depending on your data platform, you may also be able to define additional `constraints` that are enforced while the model is being built. +Your contract _must_ include every column's `name` and `data_type` (where `data_type` matches the type your data platform understands). If your model is materialized as `table` or `incremental`, you may optionally specify that certain columns must be `not_null` (containing zero null values). Depending on your data platform, you may also be able to define additional `constraints` enforced while the model is being built. Finally, you configure your model with `contract: true`. @@ -64,23 +64,23 @@ models: When building a model with a defined contract, dbt will do two things differently: -1. dbt will run a "pre-flight" check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined. -2. dbt will pass the column names, types, `not_null` and other constraints into the DDL statements it submits to the data platform, where they will be enforced while building the table. +1. dbt will run a prerequisite check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined. +2. dbt will pass the column names, types, `not_null`, and other constraints into the DDL statements it submits to the data platform, which will be enforced while building the table. ## FAQs ### Which models should have contracts? -Any model can define a contract. It's especially important to define contracts for "public" models that are being shared with other groups, teams, and (soon) dbt projects. +Any model can define a contract. Defining contracts for “public” models that are being shared with other groups, teams, and (soon) dbt projects is especially important. ### How are contracts different from tests? A model's contract defines the **shape** of the returned dataset. -[Tests](tests) are a more flexible mechanism for validating the content of your model. So long as you can write the query, you can run the test. Tests are also more configurable, via `severity` and custom thresholds, and easier to debug after finding failures, because the model has already built, and the relevant records can be materialized in the data warehouse by [storing failures](resource-configs/store_failures). +[Tests](tests) are a more flexible mechanism for validating the content of your model. So long as you can write the query, you can run the test. Tests are also more configurable via `severity` and custom thresholds and are easier to debug after finding failures. The model has already been built, and the relevant records can be materialized in the data warehouse by [storing failures](resource-configs/store_failures). In blue/green deployments (docs link TK), ... -In the parallel for software APIs: +In parallel for software APIs: - The structure of the API response is the contract - Quality and reliability ("uptime") are also **crucial**, but not part of the contract per se. From d52d1ff68d2dc9a2f671e08d206dfd2eaecca3bb Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Mon, 20 Feb 2023 13:50:13 -0500 Subject: [PATCH 08/16] Update model-versions.md --- .../docs/collaborate/publish/model-versions.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/website/docs/docs/collaborate/publish/model-versions.md b/website/docs/docs/collaborate/publish/model-versions.md index aa97d90eaf6..afe2f160649 100644 --- a/website/docs/docs/collaborate/publish/model-versions.md +++ b/website/docs/docs/collaborate/publish/model-versions.md @@ -3,26 +3,26 @@ title: "Model versions" --- :::info Beta functionality -This functionality is new in v1.5! These docs exist to provide a high-level overview of what's to come. Specific syntax is liable to change. +This functionality is new in v1.5. These docs exist to provide a high-level overview of what's to come. The specific syntax is liable to change. -For more details, and to leave your feedback, check out the GitHub discussion: +For more details and to leave your feedback, check out the GitHub discussion: * ["Model versions" (dbt-core#6736)](https://github.com/dbt-labs/dbt-core/discussions/6736) ::: -API versioning is **a hard problem** in software engineering. It's also very important. Our goal is to _make a hard thing possible_. +API versioning is a _complex_ problem in software engineering. It's also essential. Our goal is to _overcome obstacles to transform a complex problem into a reality_. ## Related documentation -* TK: `version` & `latest` (_not_ [this one](project-configs/version)) -* TK: `deprecation_date` +* Coming soon: `version` & `latest` (_not_ [this one](project-configs/version)) +* Coming soon: `deprecation_date` ## Why version a model? -If a model defines a ["contract"](model-contracts) (a set of guarantees for its structure), it's also possible to change that model's contract in a way that "breaks" the previous set of guarantees. +If a model defines a ["contract"](model-contracts) (a set of guarantees for its structure), it's also possible to change that model's contract in a way that "breaks" the previous set of parameters. -One approach is to force every consumer of the model to immediately handle the breaking change, as soon as it's deployed to production. While this may work at smaller organizations, or while iterating on an immature set of data models, it doesn't scale much beyond that. +One approach is to force every model consumer to immediately handle the breaking change when it's deployed to production. While this may work at smaller organizations or while iterating on an immature set of data models, it doesn’t scale well beyond that. -Instead, the owner of the model can create a **new version**, and provide a **deprecation window**, during which consumers can migrate from the old version to the new. +Instead, the model owner can create a **new version** and provide a **deprecation window**, during which consumers can migrate from the old version to the new. In the meantime, anywhere that model is used downstream, it can be referenced at a specific version. -When a model is reaching its deprecation date, consumers of that model will hear about it. When the date is reached, it goes away. +When a model approaches its deprecation date, consumers of that model will be notified about it. When the date is reached, it goes away. From 33aebe3074bbd1edfa454fe4efd1af3a5d29f931 Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Mon, 20 Feb 2023 13:51:37 -0500 Subject: [PATCH 09/16] Update model-contracts.md --- website/docs/docs/collaborate/publish/model-contracts.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/website/docs/docs/collaborate/publish/model-contracts.md b/website/docs/docs/collaborate/publish/model-contracts.md index 7635f0f8bcf..64f083e895f 100644 --- a/website/docs/docs/collaborate/publish/model-contracts.md +++ b/website/docs/docs/collaborate/publish/model-contracts.md @@ -1,5 +1,8 @@ --- title: "Model contracts" +id: model-contracts +sidebar_label: "Model contracts" +description: "Model contracts define a set of parameters validated during transformation" --- :::info Beta functionality From 47ef10de17591a34677b16a0a58ad7db740175af Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Mon, 20 Feb 2023 13:52:36 -0500 Subject: [PATCH 10/16] Update model-versions.md --- website/docs/docs/collaborate/publish/model-versions.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/website/docs/docs/collaborate/publish/model-versions.md b/website/docs/docs/collaborate/publish/model-versions.md index afe2f160649..99196940381 100644 --- a/website/docs/docs/collaborate/publish/model-versions.md +++ b/website/docs/docs/collaborate/publish/model-versions.md @@ -1,5 +1,8 @@ --- title: "Model versions" +id: model-versions +sidebar_label: "Model versions" +description: "Version models to help with lifecycle management" --- :::info Beta functionality From 55313179fadedde6191347668c7e82c7e5c35fdc Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Mon, 20 Feb 2023 13:54:44 -0500 Subject: [PATCH 11/16] Update website/docs/reference/resource-configs/contract.md --- website/docs/reference/resource-configs/contract.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/contract.md b/website/docs/reference/resource-configs/contract.md index 023406a3b8a..9d013eec3aa 100644 --- a/website/docs/reference/resource-configs/contract.md +++ b/website/docs/reference/resource-configs/contract.md @@ -18,7 +18,7 @@ This functionality is new in v1.5! These docs exist to provide a high-level over In particular: - The current name of the `contract` config is `constraints_enabled`. -- "Pre flight" check includes column `name` only, and is order-sensitive. We aim to add `data_type` and make it insensitive to column order. +- "Prerequisite check includes column `name` only and is order-sensitive. The goal is to add `data_type` and make it insensitive to column order. ::: # Definition From 32b4ccd4833b5feeb7c9748b1dc2d5a9176a0705 Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Mon, 20 Feb 2023 13:55:02 -0500 Subject: [PATCH 12/16] Update website/docs/reference/resource-configs/contract.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/resource-configs/contract.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/contract.md b/website/docs/reference/resource-configs/contract.md index 9d013eec3aa..c6120965458 100644 --- a/website/docs/reference/resource-configs/contract.md +++ b/website/docs/reference/resource-configs/contract.md @@ -35,7 +35,7 @@ You can manage data type constraints on your models using the `constraints_enabl ## When to use constraints vs. tests -Constraints serve as a **preventative** measure against bad data quality **before** the dbt model is (re)built. It is only limited by the respective database's funcionality and the data types that are supported. Examples of a constraint: `not null`, `unique`, `primary key`, `foreign key`, `check` +Constraints serve as a **preventative** measure against bad data quality **before** the dbt model is (re)built. It is only limited by the respective database's functionality and the data types that are supported. Examples of a constraint: `not null`, `unique`, `primary key`, `foreign key`, `check` Tests serve as a **detective** measure against bad data quality **after** the dbt model is (re)built. From 5be63409315b2b9a75c12d73685d9ed486dc0cee Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Mon, 20 Feb 2023 14:05:47 -0500 Subject: [PATCH 13/16] Update contract.md --- .../reference/resource-configs/contract.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/website/docs/reference/resource-configs/contract.md b/website/docs/reference/resource-configs/contract.md index c6120965458..3de4dd8786a 100644 --- a/website/docs/reference/resource-configs/contract.md +++ b/website/docs/reference/resource-configs/contract.md @@ -14,11 +14,11 @@ id: "contract" :::info Beta functionality -This functionality is new in v1.5! These docs exist to provide a high-level overview of what's to come. Specific syntax is liable to change. +This functionality is new in v1.5. These docs exist to provide a high-level overview of what's to come. The specific syntax is liable to change. In particular: - The current name of the `contract` config is `constraints_enabled`. -- "Prerequisite check includes column `name` only and is order-sensitive. The goal is to add `data_type` and make it insensitive to column order. +- The prerequisite check includes column `name` only and is order-sensitive. The goal is to add `data_type` and make it insensitive to column order. ::: # Definition @@ -31,15 +31,15 @@ When the `contract` configuration is enabled, dbt will ensure that your model's More to come! ::: -You can manage data type constraints on your models using the `constraints_enabled` configuration. This configuration is available on all models, and is disabled by default. When enabled, dbt will automatically add constraints to your models based on the data types of the columns in your model's schema. This is a great way to ensure your data is always in the correct format. For example, if you have a column in your model that is defined as a `date` data type, dbt will automatically add a data type constraint to that column to ensure the data in that column is always a valid date. If you want to add a `not null` constraint to a column in a preventative manner rather than as a test, you can add the `not null` value to the column definition in your model's schema: `constraints: ['not null']`. +You can manage data type constraints on your models using the `constraints_enabled` configuration. This configuration is available on all models and is disabled by default. When enabled, dbt will automatically add constraints to your models based on the data types of the columns in your model's schema. This is a great way to ensure your data is always in the correct format. For example, if you have a column in your model defined as a `date` data type, dbt will automatically add a data type constraint to that column to ensure the data in that column is always a valid date. If you want to add a `not null` condition to a column in a preventative manner rather than as a test, you can add the `not null` value to the column definition in your model's schema: `constraints: ['not null']`. ## When to use constraints vs. tests -Constraints serve as a **preventative** measure against bad data quality **before** the dbt model is (re)built. It is only limited by the respective database's functionality and the data types that are supported. Examples of a constraint: `not null`, `unique`, `primary key`, `foreign key`, `check` +Constraints serve as a **preventative** measure against bad data quality **before** the dbt model is (re)built. It is only limited by the respective database's functionality and the supported data types. Examples of constraints: `not null`, `unique`, `primary key`, `foreign key`, `check` -Tests serve as a **detective** measure against bad data quality **after** the dbt model is (re)built. +Tests serve as a **detective** measure against bad data quality _after_ the dbt model is (re)built. -Constraints are great when you define `constraints: ['not null']` for a column in your model's schema because it'll prevent `null` values being inserted into that column at dbt model creation time. AND it'll prevent other unintended values from being inserted into that column without dbt's intervention as it relies on the database to enforce the constraint. This can **replace** the `not_null` test. However, performance issues may arise depending on your database. +Constraints are great when you define `constraints: ['not null']` for a column in your model's schema because it'll prevent `null` values from being inserted into that column at dbt model creation time and prevent other unintended values from being inserted into that column without dbt's intervention as it relies on the database to enforce the constraint. This can **replace** the `not_null` test. However, performance issues may arise depending on your database. Tests should be used in addition to and instead of constraints when you want to test things like `accepted_values` and `relationships`. These are usually not enforced with built-in database functionality and are not possible with constraints. Also, custom tests will allow more flexibility and address nuanced data quality issues that may not be possible with constraints. @@ -47,8 +47,8 @@ Tests should be used in addition to and instead of constraints when you want to - `contract` (a.k.a. `constraints_enabled`) must be configured in the yaml [`config`] property _only_. Setting this configuration via in-file config or in `dbt_project.yml` is not supported. - `contract` (a.k.a. `constraints_enabled`) is supported only for a SQL model materialized as `table`. -- "Pre flight" checks include the column `name`, but not yet their `data_type`. It is our intent to support `data_type` verification in a forthcoming beta prerelease. -- The order of columns in your `yml` file must match exactly the order of columns as returned by your model's SQL query. +- Prerequisite checks include the column `name,` but not yet their `data_type`. We intend to support `data_type` verification in an upcoming beta prerelease. +- The order of columns in your `yml` file must match the order of columns returned by your model's SQL query. - While most data platforms support `not_null` checks, support for [additional `constraints`](resource-properties/constraints) varies by data platform. ```txt @@ -85,4 +85,4 @@ models: data_type: date ``` - \ No newline at end of file + From 71b787bbe7c10f8db72afc0adde5045b6f7024c6 Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Thu, 23 Feb 2023 16:50:06 -0500 Subject: [PATCH 14/16] Update website/docs/docs/collaborate/publish/model-contracts.md --- website/docs/docs/collaborate/publish/model-contracts.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/collaborate/publish/model-contracts.md b/website/docs/docs/collaborate/publish/model-contracts.md index 64f083e895f..ccc8cbb71de 100644 --- a/website/docs/docs/collaborate/publish/model-contracts.md +++ b/website/docs/docs/collaborate/publish/model-contracts.md @@ -21,7 +21,7 @@ For more details and to leave your feedback, join the GitHub discussion: Defining a dbt model is as easy as writing a SQL `select` statement or a Python Data Frame transformation. Your query naturally produces a dataset with columns of names and types based on the columns you select and the transformations you apply. -While this is ideal for quick and iterative development, for some models, constantly changing the shape of its returned dataset poses a risk when other people and processes are querying that model. It's better to define a set of upfront parameters about the shape of your model. We call this set of parameters a "contract." While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract, or it will fail to build. +While this is ideal for quick and iterative development, for some models, constantly changing the shape of its returned dataset poses a risk when other people and processes are querying that model. It's better to define a set of upfront attestations that guarantee the shape of your model. We call this set of attestations a "contract." While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract, or it will fail to build. ## How to define a contract From 81b6330edaadb4924d05e3a377071f076ed39623 Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Thu, 23 Feb 2023 16:50:23 -0500 Subject: [PATCH 15/16] Apply suggestions from code review --- website/docs/docs/collaborate/publish/model-contracts.md | 2 +- website/docs/reference/resource-configs/contract.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/website/docs/docs/collaborate/publish/model-contracts.md b/website/docs/docs/collaborate/publish/model-contracts.md index ccc8cbb71de..68a9e5748d6 100644 --- a/website/docs/docs/collaborate/publish/model-contracts.md +++ b/website/docs/docs/collaborate/publish/model-contracts.md @@ -67,7 +67,7 @@ models: When building a model with a defined contract, dbt will do two things differently: -1. dbt will run a prerequisite check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined. +1. dbt will run a preliminary verification check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined. 2. dbt will pass the column names, types, `not_null`, and other constraints into the DDL statements it submits to the data platform, which will be enforced while building the table. ## FAQs diff --git a/website/docs/reference/resource-configs/contract.md b/website/docs/reference/resource-configs/contract.md index 3de4dd8786a..ad44ce526e2 100644 --- a/website/docs/reference/resource-configs/contract.md +++ b/website/docs/reference/resource-configs/contract.md @@ -18,7 +18,7 @@ This functionality is new in v1.5. These docs exist to provide a high-level over In particular: - The current name of the `contract` config is `constraints_enabled`. -- The prerequisite check includes column `name` only and is order-sensitive. The goal is to add `data_type` and make it insensitive to column order. +- The verification check includes column `name` only and is order-sensitive. The goal is to add `data_type` and make it insensitive to column order. ::: # Definition From 23ee4ef27fab2534b8b361cd6b32b5b8ac481de9 Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Sun, 26 Feb 2023 20:01:21 -0500 Subject: [PATCH 16/16] Apply suggestions from code review --- website/docs/docs/collaborate/publish/model-contracts.md | 4 ++-- website/docs/reference/resource-configs/contract.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/docs/collaborate/publish/model-contracts.md b/website/docs/docs/collaborate/publish/model-contracts.md index 68a9e5748d6..f99ba315799 100644 --- a/website/docs/docs/collaborate/publish/model-contracts.md +++ b/website/docs/docs/collaborate/publish/model-contracts.md @@ -21,7 +21,7 @@ For more details and to leave your feedback, join the GitHub discussion: Defining a dbt model is as easy as writing a SQL `select` statement or a Python Data Frame transformation. Your query naturally produces a dataset with columns of names and types based on the columns you select and the transformations you apply. -While this is ideal for quick and iterative development, for some models, constantly changing the shape of its returned dataset poses a risk when other people and processes are querying that model. It's better to define a set of upfront attestations that guarantee the shape of your model. We call this set of attestations a "contract." While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract, or it will fail to build. +While this is ideal for quick and iterative development, for some models, constantly changing the shape of its returned dataset poses a risk when other people and processes are querying that model. It's better to define a set of upfront "guarantees" that define the shape of your model. We call this set of guarantees a "contract." While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract, or it will fail to build. ## How to define a contract @@ -67,7 +67,7 @@ models: When building a model with a defined contract, dbt will do two things differently: -1. dbt will run a preliminary verification check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined. +1. dbt will run a "preflight" check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined. 2. dbt will pass the column names, types, `not_null`, and other constraints into the DDL statements it submits to the data platform, which will be enforced while building the table. ## FAQs diff --git a/website/docs/reference/resource-configs/contract.md b/website/docs/reference/resource-configs/contract.md index ad44ce526e2..ac8cfb727fd 100644 --- a/website/docs/reference/resource-configs/contract.md +++ b/website/docs/reference/resource-configs/contract.md @@ -18,7 +18,7 @@ This functionality is new in v1.5. These docs exist to provide a high-level over In particular: - The current name of the `contract` config is `constraints_enabled`. -- The verification check includes column `name` only and is order-sensitive. The goal is to add `data_type` and make it insensitive to column order. +- The "preflight" check only includes column `name` and is order-sensitive. The goal is to add `data_type` and make it insensitive to column order. ::: # Definition