Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source freshness #6744

Open
wants to merge 9 commits into
base: current
Choose a base branch
from
13 changes: 13 additions & 0 deletions website/docs/docs/build/sources.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,19 @@ The results of this query are used to determine whether the source is fresh or n

<Lightbox src="/img/docs/building-a-dbt-project/snapshot-freshness.png" title="Uh oh! Not everything is as fresh as we'd like!"/>

### Build models based on source freshness

Our best practice recommendation is to use [data source freshness](/docs/build/sources#declaring-source-freshness). This will allow settings to be transfered into a `.yml` file where source freshness is defined on [model level](/reference/resource-properties/freshness).

To build models based on source freshness in dbt:

1. Run `dbt source freshness` to check the freshness of your sources.
2. Use the `dbt build --select source_status:fresher+` command to build and test models downstream of fresher sources.

By using the commands in that order, it ensures models are updated based on the latest data.

Additionally, [source freshness snapshots](/docs/deploy/source-freshness#enabling-source-freshness-snapshots) can be set to 30 minutes to check for source freshness and then run a job which rebuilds every 1 hour. This will retrieve all the models and rebuild them in one go if their source freshness has expired. For more information, refer to [Source freshness snapshot frequency](/docs/deploy/source-freshness#source-freshness-snapshot-frequency).

### Filter

Some databases can have tables where a filter over certain columns are required, in order prevent a full scan of the table, which could be costly. In order to do a freshness check on such tables a `filter` argument can be added to the configuration, e.g. `filter: _etl_loaded_at >= date_sub(current_date(), interval 1 day)`. For the example above, the resulting query would look like
Expand Down
Empty file.
60 changes: 60 additions & 0 deletions website/docs/reference/commands/source.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,54 @@ The `dbt source` command provides subcommands that are useful when working with

If your dbt project is [configured with sources](/docs/build/sources), then the `dbt source freshness` command will query all of your defined source tables, determining the "freshness" of these tables. If the tables are stale (based on the `freshness` config specified for your sources) then dbt will report a warning or error accordingly. If a source <Term id="table" /> is in a stale state, then dbt will exit with a nonzero exit code.

You can also use [source freshness](/docs/deploy/source-freshness) commands help make sure the data you get is new and not old or outdated.

### Configure source freshness

To configure source freshness in dbt:

- Add a `freshness` block to your source in the .yml file.
- Specify `warn_after` and `error_after` thresholds.
- Include `loaded_at_field` for each table.
- Use the `dbt source freshness` command to check freshness.

<File name='models/<filename>.yml'>

```yaml

version: 2

sources:
- name: jaffle_shop
database: raw

freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}

loaded_at_field: _etl_loaded_at

tables:
- name: customers

- name: orders
freshness:
warn_after: {count: 6, period: hour}
error_after: {count: 12, period: hour}
filter: datediff('day', _etl_loaded_at, current_timestamp) < 2

- name: product_skus
freshness: null

```
</File>

This helps to monitor data pipeline health.

You can also configure source freshness in the execution settings within your job in dbt Cloud. For more information, refer to [enabling source freshness snapshots](/docs/deploy/source-freshness#enabling-source-freshness-snapshots).

<Lightbox src="/img/docs/dbt-cloud/select-source-freshness.png" title="Selecting source freshness"/>

### Specifying sources to snapshot

By default, `dbt source freshness` will calculate freshness information for all of the sources in your project. To snapshot freshness for a subset of these sources, use the `--select` flag.
Expand Down Expand Up @@ -76,3 +124,15 @@ Snapshots of source freshness can be used to understand:
This command can be run manually to determine the state of your source data freshness at any time. It is also recommended that you run this command on a schedule, storing the results of the freshness snapshot at regular intervals. These longitudinal snapshots will make it possible to be alerted when source data freshness SLAs are violated, as well as understand the trend of freshness over time.

dbt Cloud makes it easy to snapshot source freshness on a schedule, and provides a dashboard out of the box indicating the state of freshness for all of the sources defined in your project. For more information on snapshotting freshness in dbt Cloud, check out the [docs](/docs/build/sources#source-data-freshness).

### Source freshness commands

Source freshness commands ensure you're receiving the most up-to-date, relevant, and accurate information.

Some of the typical commands you can use are:

| **Command** | **Description** |
| ----------------------------------------------------------------------------| ---------------------------------|
|[`dbt source freshness`](/reference/commands/source#dbt-source-freshness) |Checks the "freshness" for all sources.|
|[`dbt source freshness --output target/source_freshness.json](/reference/commands/source#configuring-source-freshness-output)|Output of freshness information to a different path|
|`dbt source freshness --select "source:snowplow"`|Checks the "freshness" for specific sources|
1 change: 1 addition & 0 deletions website/snippets/_selecting_source_freshness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<Lightbox src="/img/docs/dbt-cloud/select-source-freshness.png" title="Selecting source freshness"/>
Loading