Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During a DBT run, dropping a view will also drop dependent views that haven't been recreated. #2185

Closed
1 of 5 tasks
nickwu241 opened this issue Mar 6, 2020 · 2 comments
Closed
1 of 5 tasks
Labels
bug Something isn't working wontfix Not a bug or out of scope for dbt-core

Comments

@nickwu241
Copy link
Contributor

nickwu241 commented Mar 6, 2020

Describe the bug

When there are views in the DAGs e.g. (view_root -> view_middle -> view_leaf)

A dbt run causes problems because:

  1. when view_root completes, it'll drop all its dependent views, i.e. view_middle and view_leaf in this case.
  2. view_middle and view_leaf will not be re-created until they're run.
  3. This is a problem because view_root might run very early in a run while view_middle and view_leaf run hours later and causes these views to not be available.
    • Our BI tools relying on view_middle and view_leaf are broken while dbt is running.

Steps To Reproduce

Created a sample dbt project to reproduce: https://github.com/nickwu241/dbt_reproduce_drop_view_cascade_bug

It will

  • create a postgres docker locally
  • create a dbt DAG of view_root -> view_middle -> view_leaf
  • has post-hook wih SELECT * FROM pg_sleep(10)
    • so that you can see view_middle and view_leaf are unavailable after view_root runs.
    • feel free to adjust 10 to be 30 if you need more time to see the database

Expected behavior

I expect dbt to not drop the dependent views until the whole run has finished.
I expect DROP VIEW view_root_backup CASCADE; won't happen until all dependent views are re-created.

Screenshots and log output

dbt run --profiles-dir .
Running with dbt=0.15.2
Found 3 models, 0 tests, 0 snapshots, 0 analyses, 125 macros, 1 operation, 0 seed files, 1 source

09:57:30 | 
09:57:30 | Running 1 on-run-start hook
09:57:30 | 1 of 1 START hook: dbt_reproduce_drop_view_cascade_bug.on-run-star... [RUN]
09:57:30 | 1 of 1 OK hook: dbt_reproduce_drop_view_cascade_bug.on-run-start.0... [CREATE TABLE in 0.00s]
09:57:30 | 
09:57:30 | Concurrency: 1 threads (target='dev')
09:57:30 | 
09:57:30 | 1 of 3 START view model public.view_root............................. [RUN]
09:57:41 | 1 of 3 OK created view model public.view_root........................ [CREATE VIEW in 10.14s] <<<<<<<<<<< view_middle and view_leaf will be dropped here
09:57:41 | 2 of 3 START view model public.view_middle........................... [RUN]
09:57:51 | 2 of 3 OK created view model public.view_middle...................... [CREATE VIEW in 10.08s]
09:57:51 | 3 of 3 START view model public.view_leaf............................. [RUN]
09:58:02 | 3 of 3 OK created view model public.view_leaf........................ [CREATE VIEW in 10.06s]
09:58:02 | 
09:58:02 | Finished running 3 view models, 1 hook in 31.74s.

Completed successfully

Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.15.2
   latest version: 0.15.2

Up to date!

The operating system you're using:
macOS Mojave 10.14.6

The output of python --version:
Python 3.7.3

Additional context

Let me know if you require more information or have any questions! You can reach me on the DBT slack as Nick Wu as well.

Thank you!

@nickwu241 nickwu241 added bug Something isn't working triage labels Mar 6, 2020
@nickwu241 nickwu241 changed the title During a DBT run, dropping a view will also drops dependent views that haven't been recreated. During a DBT run, dropping a view will also drop dependent views that haven't been recreated. Mar 6, 2020
@drewbanin drewbanin added wontfix Not a bug or out of scope for dbt-core and removed triage labels Mar 6, 2020
@drewbanin
Copy link
Contributor

hey @nickwu241 - this is a good question! There isn't any mechanism in Postgres to define a view as "late bound" or "unbound" or anything like that. dbt unfortunately must drop cascade the root tables, resulting in downstream views being dropped during the run. Here are some examples of how this is addressed on modern data warehouses:

  • Redshift has "late bound views" which do not need to be cascade-dropped when their parent tables are dropped
  • Snowflake views are late-bound
  • BigQuery views are late-bound

My recommendation would be to either:

  • create your Bi-focused models as tables instead of views so they aren't dropped during a run
  • consider building your dbt project in a separate schema (say, analytics_new), then swap the analytics schema for the analytics_new schema at the end of the run (this is called a blue/green deployment)
  • consider (and I know, this is a heavy answer to your question) using an analytical database instead of Postgres - Snowflake, BigQuery, and Redshift are all great choices, and they come equipped with tools for analytics-minded applications like late-bound views (amongst others!)

Closing this as I don't anticipate making any changes to dbt to support this use-case on Postgres, but I do want to validate that the issue you're seeing is real, it's just not one that Postgres makes particularly easy for dbt to support today.

@nickwu241
Copy link
Contributor Author

@drewbanin Thank you for the detailed response and also with the recommendations, I appreciate it very much!

We are actually currently using Redshift for analytics but I produced the issue in Postgres because it's easier to reproduce with docker.

So for the short term I'll use bind: False for those views. And we'll definitely also consider the 2 other recommendations of

  • blue-green deploy with schemas
  • changing BI tools to depend on tables

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

No branches or pull requests

2 participants