Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting on_schema_change on incremental models using the insert_overwrite strategy with dynamic partitioning causes those models to fail #39

Closed
alexras opened this issue Oct 13, 2021 · 2 comments · Fixed by #41
Labels
bug Something isn't working

Comments

@alexras
Copy link

alexras commented Oct 13, 2021

Describe the bug

If a model is materialized as incremental, its strategy is set to insert_overwrite, its on_schema_change is set to any value except for ignore, and the compiled model refers to _dbt_max_partition, the model will fail to run with the following error:

Unrecognized name: _dbt_max_partition at [X:Y]

Steps To Reproduce

Here's a minimal example model:

{{
  config(
    materialized='incremental',
    incremental_strategy='insert_overwrite',
    partition_by={
      'field': 'date',
      'data_type': 'date',
      'granularity': 'day'
    },
    on_schema_change='append_new_columns',
  )
}}

SELECT * FROM source_table
{% if is_incremental() %}
WHERE date > _dbt_max_partition
{% endif %}

If is_incremental() is True, the model will compile to the following:

SELECT * FROM source_table
WHERE date > _dbt_max_partition

When the BigQuery adapter builds the larger SQL script it needs to run, it will hit this run_query() command, which (if I'm understanding correctly) is mainly used to compute the updates' schema so that it can be compared against the destination table's existing schema. When it tries to run that query, it hasn't yet declared _dbt_max_partition (that declaration is done here, in bq_insert_overwrite()), which triggers the error.

Expected behavior

I'm expecting the temporary relation to be created after _dbt_max_partition has been defined, which ought to allow the rest of the incremental update to proceed as normal.

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version:

installed version: 0.21.0
   latest version: 0.21.0

Up to date!

Plugins:
  - bigquery: 0.21.0
  - snowflake: 0.21.0
  - redshift: 0.21.0
  - postgres: 0.21.0

The operating system you're using:

macOS Mojave 10.14.6

The output of python --version:

Python 3.7.10

Additional context

Add any other context about the problem here.

@alexras alexras added bug Something isn't working triage labels Oct 13, 2021
@jtcohen6
Copy link
Contributor

@alexras Really good point. Thanks for opening the issue.

The inclusion of _dbt_max_partition is a real convenience factor, but I often think it's more trouble than it's worth. Especially with the advent of set_sql_header, it's kind of unnecessary!

I think we should rework it to:

  • run only if the compiled SQL included a call to _dbt_max_partition
  • call a macro, so that users can override it with custom logic—either fully disabling, or something more cost-efficient that queries a partition metadata table
  • avoid this bug!

I've got some preliminary code that would do the above. I'll push it up and, if you're open to it, ask you to take a look.

@jtcohen6 jtcohen6 removed the triage label Oct 14, 2021
@jtcohen6 jtcohen6 mentioned this issue Oct 14, 2021
4 tasks
@alexras
Copy link
Author

alexras commented Oct 14, 2021

@jtcohen6 I'd be happy to take a look! Thanks for the prompt response, and for telling me about sql_header; I had no idea that was a thing dbt could do, but I'm glad that it can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants