Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dbt_valid_to_current #6308

Open
wants to merge 25 commits into
base: current
Choose a base branch
from
Open

add dbt_valid_to_current #6308

wants to merge 25 commits into from

Conversation

mirnawong1
Copy link
Contributor

@mirnawong1 mirnawong1 commented Oct 17, 2024

adding new config for dbt_valid_to_current, which allows users to set a future date for dbt_valid_to using the dbt_valid_to_current config in yaml/config/project.

Resolves #6275

Outstanding questions:
✅ - Can date be hardcoded using this syntax? "to_date('2024-05-10')" or can they use "to_date('2024, 05, 10')" ?
✅ - is the syntax just a string (e.g. dbt_valid_to_current: '9999, 12, 31' ) or must they use to_date?
✅ - can user use var to return a SQL statement '{{ var('my_future_date') }}'
✅ - can they use a macro that returns a SQL statement '{{ dbt.date(9999, 12, 31) }}'?
✅ - how does this new config work with deferral/state:modified? will we warn users that the config has been updated and they need to manually update their snapshot?


🚀 Deployment available! Here are the direct links to the updated files:

@mirnawong1 mirnawong1 requested a review from a team as a code owner October 17, 2024 15:07
Copy link

vercel bot commented Oct 17, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
docs-getdbt-com ✅ Ready (Inspect) Visit Preview Nov 15, 2024 10:43am

@github-actions github-actions bot added content Improvements or additions to content Docs team Authored by the Docs team @dbt Labs size: medium This change will take up to a week to address labels Oct 17, 2024
@dbeatty10
Copy link
Contributor

@mirnawong1 grouping your outstanding questions into a handful of categories:

Hardcoded SQL expressions

Question:

  • Can date be hardcoded using this syntax? "to_date('2024-05-10')" or can they use "to_date('2024, 05, 10')" ?

👍 Hardcoded expressions that work within the data platform is the only option that is available currently. The precise syntax will vary per warehouse, so to_date('2024-05-10') may work for some, but others may require something like date(2024, 5, 10).

Jinja within dbt_valid_to_current

Questions:

  • is the syntax just a string (e.g. dbt_valid_to_current: '9999, 12, 31' ) or must they use to_date?
  • can user use var to return a SQL statement '{{ var('my_future_date') }}'
  • can they use a macro that returns a SQL statement '{{ dbt.date(9999, 12, 31) }}'?

👎 Jinja is not currently available for dbt_valid_to_current. So none of the above are possible, and the only option is to supply a full SQL expression that is specific to the users data warehouse.

Deferral / state:modified

Question:

  • how does this new config work with deferral/state:modified? will we warn users that the config has been updated and they need to manually update their snapshot?

👍 Based on hands-on experimentation, it does work as expected with deferral / state:modified. So if this config changes, than it will show up when using --select state:modified.

@dbeatty10 dbeatty10 added the blocked_by_dev Awaiting merge of PR with associated functionality label Oct 28, 2024
@dbeatty10
Copy link
Contributor

@mirnawong1 I added the blocked_by_dev label until this is resolved:

@mirnawong1
Copy link
Contributor Author

@mirnawong1 I added the blocked_by_dev label until this is resolved:

thank you @dbeatty10 ! it looks like this is resolved now right?

@dbeatty10 dbeatty10 removed the blocked_by_dev Awaiting merge of PR with associated functionality label Nov 12, 2024
Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test out the code examples, but looks good to me in general.

Left a couple comments on some small non-blocking items.

website/docs/docs/build/snapshots.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@graciegoheen graciegoheen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments but this is looking really good, thank you!

website/docs/docs/build/snapshots.md Outdated Show resolved Hide resolved

By default, `dbt_valid_to` is `NULL` for current records. However, if you set the [`dbt_valid_to_current` configuration](/reference/resource-configs/dbt_valid_to_current) (available in Versionless and 1.9 and higher), `dbt_valid_to` will be set to your specified value (such as `9999-12-31`) for current records.

This simplifies your SQL queries by avoiding `NULL` checks and allowing for straightforward date range filtering.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "simplifies your SQL queries by avoiding NULL checks" mean?

Copy link
Contributor Author

@mirnawong1 mirnawong1 Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant users not needing to include checks or add'l logic to check for null records.

i can remove if it's confusing or clarify further!

Note, these column names can be customized to your team or organizational conventions using the [snapshot_meta_column_names](#snapshot-meta-fields) config.
#### Note
- These column names can be customized to your team or organizational conventions using the [snapshot_meta_column_names](#snapshot-meta-fields) config.
- If you have set the `dbt_valid_to_current` configuration option, then instead of `NULL`, the `dbt_valid_to` field in future records will be set to your specified value (such as `9999-12-31`).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing some clarity that this is set to NULL (or whatever you have set dbt_valid_to_current to for current records)

default_value: {NULL}
id: "dbt_valid_to_current"
---

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where we want this, but we should probably have a migration callout/warning like what we have for the other new configs cc: @dbeatty10

Starting in 1.9 or with [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless), these column names can be customized to your team or organizational conventions via the [`snapshot_meta_column_names`](/reference/resource-configs/snapshot_meta_column_names) config.
Starting in 1.9 or with [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless):
- These column names can be customized to your team or organizational conventions using the [`snapshot_meta_column_names`](/reference/resource-configs/snapshot_meta_column_names) config.
- Use the [`dbt_valid_to_current`](/reference/resource-configs/dbt_valid_to_current) config to set a custom future date for `dbt_valid_to` in new snapshot columns. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` in the snapshot table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, i would rephrase this as suggested above


## Description

Use the `dbt_valid_to_current` config to set a custom future date for `dbt_valid_to` in new snapshot columns. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` in the snapshot table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, rephrase as suggested above

thinking about this more, i do like that we're calling out that the main use case for this is "a future date"


Use the `dbt_valid_to_current` config to set a custom future date for `dbt_valid_to` in new snapshot columns. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` in the snapshot table.

This approach makes it easier to assign a custom date date, work in a join, or perform range-based filtering that require an end date.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"date date"

The value assigned to `dbt_valid_to_current` should be a string representing a valid date or timestamp, depending on your database's requirements. Use expressions that work within the data platform.

### Managing records
- **For existing records** — To avoid any unintentional data modification, dbt will _not_ automatically adjust the current value in the existing `dbt_valid_to` column. Existing current records will still have `dbt_valid_to` set to `NULL`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah here's the migration callout - I think we just want to make sure this warning exists elsewhere as well!


The value assigned to `dbt_valid_to_current` should be a string representing a valid date or timestamp, depending on your database's requirements. Use expressions that work within the data platform.

### Managing records
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more specific descriptor here for the heading?


- **For new records** — Any new records inserted after applying the `dbt_valid_to_current` configuration will have `dbt_valid_to` set to the specified value (for example, '9999-12-31'), instead of `NULL`.

This means your snapshot table will have current records with `dbt_valid_to` values of both `NULL` (from existing data) and the new specified value (from new data). If you'd rather have consistent `dbt_valid_to` values for current records, you can either manually update existing records in your snapshot table where `dbt_valid_to` is `NULL` to match your `dbt_valid_to_current` value or rebuild your snapshot table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rebuild your snapshot table" would be very risky (lose all of your historical data!)

@@ -87,13 +81,14 @@ The following table outlines the configurations available for snapshots:
| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] |
| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at |
| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source and set `dbt_valid_to` to current time if the record no longer exists | No | True |
| [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current) | Set a custom indicator for the value of `dbt_valid_to` in current snapshot records. By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` in the snapshot table.| No | string |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@graciegoheen , it looks like gerda says the config applies to new snapshot columns, should i add that here?

Suggested change
| [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current) | Set a custom indicator for the value of `dbt_valid_to` in current snapshot records. By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` in the snapshot table.| No | string |
| [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current) | Set a custom indicator for the value of `dbt_valid_to` in current snapshot records. By default, this value is `NULL`. When configured, dbt will use the specified value for `dbt_valid_to` in new snapshot columns, but it will not automatically adjust the value in existing columns. Existing columns that use `NULL` can be updated manually if needed. | No | string |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content Docs team Authored by the Docs team @dbt Labs size: medium This change will take up to a week to address
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core] Allow custom date for dbt_valid_to in snapshots
3 participants