Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-10785] Microbatch models should respect full_refresh model config #10788

Merged

Conversation

QMalcolm
Copy link
Contributor

@QMalcolm QMalcolm commented Sep 26, 2024

Resolves #10785

Problem

In dbt-core, it is generally expected that values passed via CLI flags take precedence over model level configs. However, full_refresh on a model is an exception to this rule, where in the model config takes precedence. This config exists specifically to prevent accidental full refreshes of large incremental models, as doing so can be costly. It is actually best practice to set full_refresh=False on incremental models.

Prior to this PR, for microbatch models, the above was not happening. The CLI flag --full-refresh was taking precedence over the model config full_refresh. That meant that if --full-refresh was supplied, then the microbatch model would full refresh even if full_refresh=False was set on the model. This PR solves that problem.

Solution

Update _is_incremental to prefer the model.config.full_refresh value over the RuntimeConfig.args.FULL_REFRESH value provided by CLI flags.

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

In dbt-core, it is generally expected that values passed via CLI flags take
precedence over model level configs. However, `full_refresh` on a model is an
exception to this rule, where in the model config takes precedence. This
config exists specifically to _prevent_ accidental full refreshes of large
incremental models, as doing so can be costly. **_It is actually best
practice_** to set `full_refresh=False` on incremental models.

Prior to this commit, for microbatch models, the above was not happening. The
CLI flag `--full-refresh` was taking precedence over the model config
`full_refresh`. That meant that if `--full-refresh` was supplied, then the
microbatch model **_would full refresh_** even if `full_refresh=False` was
set on the model. This commit solves that problem.
@QMalcolm QMalcolm requested a review from a team as a code owner September 26, 2024 20:31
@cla-bot cla-bot bot added the cla:yes label Sep 26, 2024
@dbt-labs dbt-labs deleted a comment from github-actions bot Sep 26, 2024
Copy link

codecov bot commented Sep 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.09%. Comparing base (41e4836) to head (4d5381e).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10788      +/-   ##
==========================================
+ Coverage   86.34%   89.09%   +2.75%     
==========================================
  Files         182      183       +1     
  Lines       23288    23321      +33     
==========================================
+ Hits        20107    20779     +672     
+ Misses       3181     2542     -639     
Flag Coverage Δ
integration 86.32% <100.00%> (-0.02%) ⬇️
unit 62.21% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 62.21% <100.00%> (∅)
Integration Tests 86.32% <100.00%> (-0.02%) ⬇️

Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good but I have some questions about the new behavior.

Is this a behavior change for the existing incremental model also?
My understanding is with the new overwrite order, the user can not do a full refresh without changing the code. Is this correct? If yes is this what we want? @graciegoheen
Feel free to dismiss my review if my understanding is wrong.

My understanding was wrong

@QMalcolm
Copy link
Contributor Author

QMalcolm commented Sep 26, 2024

The code looks good but I have some questions about the new behavior.

Is this a behavior change for the existing incremental model also? My understanding is with the new overwrite order, the user can not do a full refresh without changing the code. Is this correct? If yes is this what we want? @graciegoheen

Feel free to dismiss my review if my understanding is wrong.

Good question! I was also confused when I first learned about it 😅

For clarification, this is not new behavior for incremental models. This is already how incremental models operate on the full_refresh model config. This is only a change to make microbatch models following the same pattern. Here is the documentation on the full_refresh model config.

@QMalcolm QMalcolm dismissed ChenyuLInx’s stale review September 26, 2024 21:39

Dismissing review as reviewer suggested

Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@QMalcolm QMalcolm merged commit d8b1bf5 into main Sep 26, 2024
68 of 71 checks passed
@QMalcolm QMalcolm deleted the qmalcolm--10785-microbatch-models-respect-full-refresh-config branch September 26, 2024 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure microbatch models respect model level full_refresh config during batch creation
2 participants