Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial "wireframe" for model contracts #2890

Merged
merged 18 commits into from
Feb 27, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 28 additions & 8 deletions website/dbt-versions.js
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,34 @@ exports.versions = [
]

exports.versionedPages = [
{
"page": "docs/collaborate/publish/model-contracts",
"firstVersion": "1.5",
},
{
"page": "docs/collaborate/publish/model-access",
"firstVersion": "1.5",
},
{
"page": "docs/collaborate/publish/model-versions",
"firstVersion": "1.5",
},
{
"page": "reference/resource-configs/contract",
"firstVersion": "1.5",
},
{
"page": "reference/resource-properties/constraints",
"firstVersion": "1.5",
},
{
"page": "reference/dbt-jinja-functions/local-md5",
"firstVersion": "1.4",
},
{
"page": "reference/warehouse-setups/fal-setup",
"firstVersion": "1.3",
},
Comment on lines +51 to +57
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No functional change here, I just reordered the list so it's in a consistent (descending) order by firstVersion. I figure, in the future, we could remove items from this list as we deprecate older versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

{
"page": "reference/dbt-jinja-functions/set",
"firstVersion": "1.2",
Expand Down Expand Up @@ -55,12 +83,4 @@ exports.versionedPages = [
"page": "reference/dbt-jinja-functions/print",
"firstVersion": "1.1",
},
{
"page": "reference/dbt-jinja-functions/local-md5",
"firstVersion": "1.4",
},
{
"page": "reference/warehouse-setups/fal-setup",
"firstVersion": "1.3",
},
]
64 changes: 64 additions & 0 deletions website/docs/docs/collaborate/publish/model-access.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
title: "Model access"
id: model-access
sidebar_label: "Model access"
description: "Define model access with group capabilities"
---

:::info Beta functionality
This functionality is new in v1.5. These docs exist to provide a high-level overview of what's to come. The specific syntax is liable to change.

For more details and to leave your feedback, join the GitHub discussion:
["Model groups & access" (dbt-core#6730)](https://github.com/dbt-labs/dbt-core/discussions/6730)
:::

## Related documentation
* Coming soon: `groups`
* Coming soon: `access` modifiers

### Groups

Models can be grouped under a common designation with a shared owner.

Why define model `groups`?
- It turns implicit relationships into an explicit grouping
- It enables you to mark specific models as "private" for use _only_ within that group

### Access modifiers

Some models (not all) are designed to be shared across groups.

https://en.wikipedia.org/wiki/Access_modifiers

| Keyword | Meaning |
|-----------|----------------------|
| private | same group |
| protected | same project/package |
| public | everybody |

By default, all models are "protected." This means that other models in the same project can reference them.

:::info Under construction 🚧
The following syntax is currently under review and does not work.
:::

<File name="models/marts/customers.yml">

```yaml
groups:
- name: cx
owner:
name: Customer Success Team
email: cx@jaffle.shop

models:
- name: dim_customers
group: cx
access: public
# this is an intermediate transformation -- relevant to the CX team only
- name: int__customer_history_rollup
group: cx
access: private
```

</File>
89 changes: 89 additions & 0 deletions website/docs/docs/collaborate/publish/model-contracts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
title: "Model contracts"
id: model-contracts
sidebar_label: "Model contracts"
description: "Model contracts define a set of parameters validated during transformation"
---

:::info Beta functionality
This functionality is new in v1.5. These docs provide a high-level overview of what's to come. The specific syntax is liable to change.

For more details and to leave your feedback, join the GitHub discussion:
* ["Model contracts" (dbt-core#6726)](https://github.com/dbt-labs/dbt-core/discussions/6726)
:::

## Related documentation
* [`contract`](resource-configs/contract)
* [`columns`](resource-properties/columns)
* [`constraints`](resource-properties/constraints)

## Why define a contract?

Defining a dbt model is as easy as writing a SQL `select` statement or a Python Data Frame transformation. Your query naturally produces a dataset with columns of names and types based on the columns you select and the transformations you apply.

While this is ideal for quick and iterative development, for some models, constantly changing the shape of its returned dataset poses a risk when other people and processes are querying that model. It's better to define a set of upfront parameters about the shape of your model. We call this set of parameters a "contract." While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract, or it will fail to build.
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved

## How to define a contract

Let's say you have a model with a query like:

<File name="models/marts/dim_customers.sql">

```sql
-- lots of SQL

final as (

select
-- lots of columns
from ...

)

select * from final
```
</File>

Your contract _must_ include every column's `name` and `data_type` (where `data_type` matches the type your data platform understands). If your model is materialized as `table` or `incremental`, you may optionally specify that certain columns must be `not_null` (containing zero null values). Depending on your data platform, you may also be able to define additional `constraints` enforced while the model is being built.

Finally, you configure your model with `contract: true`.

<File name="models/marts/customers.yml">

```yaml
models:
- name: dim_customers
config:
contract: true
columns:
- name: customer_id
data_type: int
not_null: true
- name: customer_name
data_type: string
...
```

</File>

When building a model with a defined contract, dbt will do two things differently:
1. dbt will run a prerequisite check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined.
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
2. dbt will pass the column names, types, `not_null`, and other constraints into the DDL statements it submits to the data platform, which will be enforced while building the table.

## FAQs

### Which models should have contracts?

Any model can define a contract. Defining contracts for “public” models that are being shared with other groups, teams, and (soon) dbt projects is especially important.

### How are contracts different from tests?

A model's contract defines the **shape** of the returned dataset.

[Tests](tests) are a more flexible mechanism for validating the content of your model. So long as you can write the query, you can run the test. Tests are also more configurable via `severity` and custom thresholds and are easier to debug after finding failures. The model has already been built, and the relevant records can be materialized in the data warehouse by [storing failures](resource-configs/store_failures).

In blue/green deployments (docs link TK), ... <!-- TODO write more here -->

In parallel for software APIs:
- The structure of the API response is the contract
- Quality and reliability ("uptime") are also **crucial**, but not part of the contract per se.
31 changes: 31 additions & 0 deletions website/docs/docs/collaborate/publish/model-versions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: "Model versions"
id: model-versions
sidebar_label: "Model versions"
description: "Version models to help with lifecycle management"
---

:::info Beta functionality
This functionality is new in v1.5. These docs exist to provide a high-level overview of what's to come. The specific syntax is liable to change.

For more details and to leave your feedback, check out the GitHub discussion:
* ["Model versions" (dbt-core#6736)](https://github.com/dbt-labs/dbt-core/discussions/6736)
:::

API versioning is a _complex_ problem in software engineering. It's also essential. Our goal is to _overcome obstacles to transform a complex problem into a reality_.

## Related documentation
* Coming soon: `version` & `latest` (_not_ [this one](project-configs/version))
* Coming soon: `deprecation_date`

## Why version a model?

If a model defines a ["contract"](model-contracts) (a set of guarantees for its structure), it's also possible to change that model's contract in a way that "breaks" the previous set of parameters.

One approach is to force every model consumer to immediately handle the breaking change when it's deployed to production. While this may work at smaller organizations or while iterating on an immature set of data models, it doesn’t scale well beyond that.

Instead, the model owner can create a **new version** and provide a **deprecation window**, during which consumers can migrate from the old version to the new.

In the meantime, anywhere that model is used downstream, it can be referenced at a specific version.

When a model approaches its deprecation date, consumers of that model will be notified about it. When the date is reached, it goes away.
17 changes: 12 additions & 5 deletions website/docs/guides/migration/versions/02-upgrading-to-v1.5.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
title: "Trying v1.5 (prerelease)"
description: New features and changes in dbt Core v1.5
---

:::info
v1.5 is currently available as a **beta prerelease.** Availability in dbt Cloud coming soon!
:::

### Resources

- [Changelog](https://github.com/dbt-labs/dbt-core/blob/main/CHANGELOG.md)
Expand Down Expand Up @@ -43,12 +48,14 @@ Coming soon: GH discussion detailing interface changes and offering a forum for

## New and changed documentation

Coming soon
:::caution Under construction 🚧
More to come!
:::

### "Models as APIs"
- Model contracts ([#2839](https://github.com/dbt-labs/docs.getdbt.com/issues/2839))
- Model groups & access ([#2840](https://github.com/dbt-labs/docs.getdbt.com/issues/2840))
- Model versions ([#2841](https://github.com/dbt-labs/docs.getdbt.com/issues/2841))
### Publishing models as APIs
- [Model contracts](model-contracts) ([#2839](https://github.com/dbt-labs/docs.getdbt.com/issues/2839))
- [Model access](model-access) ([#2840](https://github.com/dbt-labs/docs.getdbt.com/issues/2840))
- [Model versions](model-versions) ([#2841](https://github.com/dbt-labs/docs.getdbt.com/issues/2841))

### dbt-core Python API
- Auto-generated documentation ([#2674](https://github.com/dbt-labs/docs.getdbt.com/issues/2674)) for dbt-core CLI & Python API for programmatic invocations
7 changes: 5 additions & 2 deletions website/docs/reference/model-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ models:
[+](plus-prefix)[full_refresh](full_refresh): <boolean>
[+](plus-prefix)[meta](meta): {<dictionary>}
[+](plus-prefix)[grants](grants): {<dictionary>}
[+](plus-prefix)[contract](contract): true | false

```

Expand Down Expand Up @@ -134,6 +135,7 @@ models:
[full_refresh](full_refresh): <boolean>
[meta](meta): {<dictionary>}
[grants](grants): {<dictionary>}
[contract](contract): true | false
```

</File>
Expand All @@ -157,8 +159,9 @@ models:
[schema](resource-configs/schema)="<string>",
[alias](resource-configs/alias)="<string>",
[persist_docs](persist_docs)={<dict>},
[meta](meta)={<dict>}
[grants](grants)={<dict>}
[meta](meta)={<dict>},
[grants](grants)={<dict>},
[contract](contract)=true | false
) }}

```
Expand Down
88 changes: 88 additions & 0 deletions website/docs/reference/resource-configs/contract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
resource_types: [models]
datatype: "{<dictionary>}"
default_value: {contract: false}
id: "contract"
---


## Related documentation
- [What is a model contract?](publish/model-contracts)
- [Defining `columns`](resource-properties/columns)
- [Defining `constraints`](resource-properties/constraints)

<!-- TODO: move some of this content elsewhere, and update to reflect new proposed syntax -->

:::info Beta functionality
This functionality is new in v1.5. These docs exist to provide a high-level overview of what's to come. The specific syntax is liable to change.

In particular:
- The current name of the `contract` config is `constraints_enabled`.
- The prerequisite check includes column `name` only and is order-sensitive. The goal is to add `data_type` and make it insensitive to column order.
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
:::

# Definition

When the `contract` configuration is enabled, dbt will ensure that your model's returned dataset exactly matches the attributes you have defined in yaml:
- `name` and `data_type` for every column
- additional [`constraints`](resource-properties/constraints), as supported for this materialization + data platform

:::caution Under construction 🚧
More to come!
:::

You can manage data type constraints on your models using the `constraints_enabled` configuration. This configuration is available on all models and is disabled by default. When enabled, dbt will automatically add constraints to your models based on the data types of the columns in your model's schema. This is a great way to ensure your data is always in the correct format. For example, if you have a column in your model defined as a `date` data type, dbt will automatically add a data type constraint to that column to ensure the data in that column is always a valid date. If you want to add a `not null` condition to a column in a preventative manner rather than as a test, you can add the `not null` value to the column definition in your model's schema: `constraints: ['not null']`.

## When to use constraints vs. tests

Constraints serve as a **preventative** measure against bad data quality **before** the dbt model is (re)built. It is only limited by the respective database's functionality and the supported data types. Examples of constraints: `not null`, `unique`, `primary key`, `foreign key`, `check`

Tests serve as a **detective** measure against bad data quality _after_ the dbt model is (re)built.

Constraints are great when you define `constraints: ['not null']` for a column in your model's schema because it'll prevent `null` values from being inserted into that column at dbt model creation time and prevent other unintended values from being inserted into that column without dbt's intervention as it relies on the database to enforce the constraint. This can **replace** the `not_null` test. However, performance issues may arise depending on your database.

Tests should be used in addition to and instead of constraints when you want to test things like `accepted_values` and `relationships`. These are usually not enforced with built-in database functionality and are not possible with constraints. Also, custom tests will allow more flexibility and address nuanced data quality issues that may not be possible with constraints.

## Current Limitations

- `contract` (a.k.a. `constraints_enabled`) must be configured in the yaml [`config`] property _only_. Setting this configuration via in-file config or in `dbt_project.yml` is not supported.
- `contract` (a.k.a. `constraints_enabled`) is supported only for a SQL model materialized as `table`.
- Prerequisite checks include the column `name,` but not yet their `data_type`. We intend to support `data_type` verification in an upcoming beta prerelease.
- The order of columns in your `yml` file must match the order of columns returned by your model's SQL query.
- While most data platforms support `not_null` checks, support for [additional `constraints`](resource-properties/constraints) varies by data platform.

```txt
# example error message
Compilation Error in model constraints_example (models/constraints_examples/constraints_example.sql)
Please ensure the name, order, and number of columns in your `yml` file match the columns in your SQL file.
Schema File Columns: ['id', 'date_day', 'color']
SQL File Columns: ['id', 'color', 'date_day']
```

## Example

<File name='models/schema.yml'>

```yml
models:
- name: constraints_example
config:
constraints_enabled: true
columns:
- name: id
data_type: integer
description: hello
constraints: ['not null', 'primary key']
constraints_check: (id > 0)
tests:
- unique
- name: color
constraints:
- not null
- primary key
data_type: string
- name: date_day
data_type: date
```

</File>
Loading