-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental Load Predicates To Bound unique_id scans #3293
Comments
@dm03514 I'm a huge fan of this, and of the in-progress work over in #3294! It's always been possible to make the interventions you suggest, by overriding macros + materializations user-space. We're always trying to find the right balance between:
Once we see the same extensions/adaptations by community members over and over again, it's a clearer indication that building a new config into the default materialization is worth the trade-off in added complexity—and we've seen this one come up often. I think we can do a lot just by giving users the option to pass an arbitrary list of predicates, and stick them in the right spot within the DML statement. My only requirement for this work would be consistency:
There's an even-more-generalized version of this that I previously thought about as pluggable incremental strategies (#2366). But I think a new config, given the existing strategies, makes a lot of sense. Let me know if you agree with the above! If so, I think #3294 could be merge-able with just a little bit of work to make it consistent across adapters + strategies. |
Thank you @jtcohen6 for responding. AHH i didn't realize that this was already achievable through overriding of macros 😅 . The consistency requirements are clear. I will get a draft PR up which consistently adds I need to check out #2366! I plan on getting a draft PR up for all adapters & strategies by end of next weekend (and hopefully sometime sooner). Thank you for your feedback @jtcohen6 |
I moved forward with the user defined strat while this submission is still in progress:
Usage looks like:
|
WE've completely settled on the merge approach with predicates. We created a light macro to expose the predicates through the |
Would be great if we could have this feature. See also discussion on discourse: |
not urgent but I'm definitely still interested in this, one pattern I'd like to use with a custom incremental_strategy is applying a lookback to the target table. example snowflake pseudo SQL:
the use case is that the target table continues to grow larger and the batch has data within 30-ish days, so it's safe to only look at the target for the same date range to avoid scanning the whole table we already do this with Airflow, but it's a pain to have to decide between Airflow-only vs dbt when the SQL is simple enough for dbt looks like this is close with dave-connors-3's PR #4546 so I'll keep an eye on that. thanks! |
PRs for this have finally been merged, and this feature will be included in v1.4 :) |
Describe the feature
Hello! I work with a number of very large tables 8-60TB and growing daily. Data is loaded incrementally.
I often use the delete+insert incremental load strategy to ensure that the target table is duplicate free. Scan time on these large tables are often multi hour.
Below shows an incremental query using delete+insert executed against this large table in snowflake:
Below shows the detailed profile:
it takes a lot of resources to perform full table scans on mult-terrabyte tables.
Is it possible to add support for predicates on the incremental load sql?
I created a POC Pull Request to illustrate this in action. The incremental_predicates are defined as part of the config:
THe image below shows the predicates applied to the incremental load:
The effect of bounding the incremental unique window are profound:
Of course, not every workload supports a bounded unique window, but we found it applicable for our use case.
Describe alternatives you've considered
I could think of a couple alternatives for this (none are dbt based):
Additional context
I believe all databases could benefit from optional support of incremental predicates.
Who will this benefit?
This should benefit any dbt users who have:
Are you interested in contributing this feature?
Yes! I would be happy to!
The text was updated successfully, but these errors were encountered: