-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix legacy insert by period materialization #1
base: master
Are you sure you want to change the base?
Conversation
* PR - lowercase `except` values in star() resolves dbt-labs#402 * Update test_star.sql * Update CHANGELOG.md * Update test_star.sql Co-authored-by: jasnonaz <jason.ganz64@gmail.com>
One more question. I encountered another issue with Postgres as described in the same comment dbt-labs/dbt-labs-experimental-features#32. But I couldn't track down in your new code what happened with it. Perhaps you can comment if this part follows the same logic as before. And if it this error will reappear. The error I got (after fixing the subquery alias above) was: "cannot create temporary relation in non-temporary schema". I "solved" it by changing the {%- set tmp_relation = api.Relation.create(identifier=tmp_identifier,
schema=None, type='table') -%} -- this is line 130 Potentially breaking other features or supported DB engines, but my setup was fine that. |
I updated the code as a direct result of your question, hopefully for the better! As I started from the incremental materialization, I took a look and I realized, that Unfortunately, I can’t test this on Postgres, but I tested on Snowflake, and the updated code works as expected. I think using a dbt core function is the way to go forward. Thanks for your comment! |
19dd99f
to
f82f2ee
Compare
All the checks passed for Postgres, so I think we are good there. The rest of the integration checks won’t pass here, in this repository. Thank you again @etoulas for your comments. I opened a PR with this change in dbt-utils too, so if you’d like to follow the conversation, here it is: dbt-labs#410 @rgabo thanks again for your continuous support on this. What do you think, should we 1) wait for some feedback on the other PR or 2) merge this back locally and modify it later if it’s necessary based on the review we receive? Also, do you have any idea why their CI didn't run for our PR? |
@HorvathDanielMarton with regards to this PR, we can simply use the functionality from the With that in mind, we can close this PR and keep all relevant discussion in dbt-labs#410. As far as CI goes, it is likely that PRs are not automatically run against CI, we'll likely need an author sprinkle his magic on the PR to trigger CI 😄 |
This is a:
master
dev/
branchdev/
branchDescription & motivation
The goal of this change is to get the
insert_by_period
materialization compatible with Snowflake while using the currently latest dbt (0.21.0b2) and keeping the same functionality.The
insert_by_period
materialization was introduced in this discourse post and the intention behind it is to help with “problematic” initial builds of data models. Here, problematic means that some tables are just so large and/or complex that a simple--full-refresh
is not adequate because the build will be incredibly slow, inefficient, and can even get stuck.Unfortunately, the currently available version does not support Snowflake and it’s advised to refactor it by using the refreshed code for the incremental materialization as a good starting point. As a result, this change addresses dbt-labs/dbt-labs-experimental-features#32.
The materialization itself was completely built from the current incremental materialization and the modifications done to it was inspired by the original
insert_by_period
.The macros that were originally in the
insert_by_period.sql
was moved to ahelpers.sql
file where the Snowflake version ofget_period_boundaries()
, andget_period_sql()
are introduced. Also, a newcheck_for_period_filter()
macro has been added.The materialization seems fully functional and so far works as expected, nevertheless, I'm still testing it against production models.
Ideas
day
,week
,month
, etc... as a unit. However, we might want to build our models fortnightly (ie. 2-week chunks). We could introduce an optional configuration that lets us use a configurable number ofperiod
s as a chunk. The fortnightly example could look like this:__PERIOD_FILTER__
by some time (for example a month lookback time, but it should be configurable). Currently, this materialization doesn't support this, but that is surely a shortcoming for some of our models whereinsert_by_period
would come in handy because of their massive size & complexity. Also, it seems like our team is not the only ones who bumped into this: Feature: Add optionallookback_interval
param toinsert_by_period
materialization dbt-labs/dbt-utils#394A weird behavior
The materialization can produce a weird behavior if it’s executed on an already existing model because the “last incremental run” may have a very short overlap with the specified date range. Let’s take the following config as an example:
This is going to build a model with events
2021-08-28 < created_at < 2021-08-31
, so it will have 3 days, thus in 3 steps it will be built, given it’s a fresh start and the run is uninterrupted. After building the 1st day, if the run is terminated, the next run will have 3 steps, again. It's not unlikely, that the last step will only insert a handful of events (many magnitudes smaller, than the previous ones). Like on this CLI output:This may raise some eyebrows, but it makes perfect sense if we look at the query’s where clause - which is basically this:
So the last iteration is only looking for events that happened between
2021-08-30 23:59:59.852
and2021-08-30 23:59:59.999
, which is far from even being a second. It would be nice to deal with this, so it won't cause suspicion that something is buggy.This is most conspicuous for large models with a lot of events.
Checklist
star()
source)