-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] No more jinja block for snapshots - new snapshot design ideas #10246
Comments
Personally I like option 2! As far as the questions on that one, here are my thoughts:
|
IMHO, this is the most important question of this epic, and I think you can't truly answer the questions like "dbt run vs dbt snapshot" without answering the "best practice" question. I believe there are 2 very very different types of snapshots:
If I take a
For a snapshot of a raw source model, I would want those snapshots to run after every fivetran sync (every 15 minutes or so). For a snapshot of a dbt model, I would want those snapshots to run during every analytics
While this pushes me towards the snapshot target should have the developer schema prefix, even this behavior varies drastically between (what I consider) the 2 different types of snapshots:
With all of the above ramblings, there is a very strong overlap with orchestration and data mesh..... which I do not take lightly. The thing I'm curious about: if the "sensible defaults" for "raw snapshots" vs "model snapshots" were clear enough:
NOTE: It is possible to solve all of the above with the current |
Opened a new issue in dbt-labs/docs.getdbt.com: dbt-labs/docs.getdbt.com#6122 |
This is now done and merged. I updated Option 1 in the issue description to match the implementation. Most properties remain in the config section under the snapshot. This matches the way those same properties are set when defining snapshots via the existing SQL method. |
Can you offer the ability to change the auto-generated timestamp by dbt SCD? For example, I have a table of people partitioned by day, but when I run the partitions for 24-10-2024 and 25-10-2024, the snapshot reflects the system time in the valid_to and valid_from timestamps, not the valid date of the partition. That is, if I run 2 partitions on the same day, the snapshot will show that the records changed on the same day but at different times, which is not true since they are scans of the source table from different days. Is it possible for these dates to be configurable, both as date or timestamp, and also to be variable and not auto-generated? This way, I can pass the ELT process date |
Is this your first time submitting a feature request?
Current State
To configure a snapshot currently, you must nest your configuration and SQL within a snapshot jinja block like so:
Why? (you might ask)
The story begins…
Snapshots are a really ancient dbt feature -- implemented as
dbt archive
in #183 and first released in 0.5.1, just two days shy of dbt's 6 month anniversary.There were no
snapshot
blocks andsnapshots/*.sql
files in these early days.Instead, they were originally declared within
dbt_project.yml
like this:A glow up
#1175 and #1361 allowed snapshots to escape YAML Land and become:
At the time the thought was, “we should/will reimplement all the resources like this” (so that you could define multiple “model blocks” in a single file).
Turns out that defining multiple resources in one file makes
and so in the leadup to v1.0, it wasn’t a priority to do this rework — and finally decided it wasn’t really even desirable.
Future State
WE ARE GOING WITH OPTION 1
Option 1: Snapshots are just yml configs, they contain no logic (like exposures, sources, tests, etc.)
select * from {{ source('jaffle_shop', 'orders') }}
Option 2: Snapshots are just models, they are a materialization (like incremental, view, table, etc.)
dbt run
? vs.dbt snapshot
?snapshot-paths
? would you be able to put them in your models folder? or only snapshots folder?resource_type
- model or snapshot (incremental is model)? if we went with model, that would be an id change, adjustment to selector syntax for dbt build,DBT_EXCLUDE_RESOURCE_TYPE
wouldn’t work for excluding snapshots, etc.materialized='snapshot'
ormaterialized='scd'
Option 3: Snapshots are just sql files in the snapshots folder, but they don’t use jinja blocks (one .sql file per snapshot)
Which is best?
select *
” best practiceNotes
Related issues
#4761
#9033
The text was updated successfully, but these errors were encountered: