Allow require_partition_filter=true for BigQuery table creation #1843

hui-zheng · 2019-10-19T18:24:05Z

Describe the feature

I would like to have enable require_partition_filter=true in table and incremental mode.
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#table_option_list

for example, we could config model as

{{
  config (
    materialized='table',
    partition_by='DATE(stamp)',
    partition_by_required='true',
  )
}}

Currently, in 0.14.3, bigquery/adapters.sql, only persist_docs and temporary options is enable.
Could we make bigquery_table_options macro below flexible to take in any other options, so that we could pass a generic key/value list in model config to create table options
{% macro bigquery_table_options(persist_docs, temporary) %}

for example, we could config model as

{{
  config (
    materialized='table',
    partition_by='DATE(stamp)',
    options = '{ partition_by_required=true, partition_expiration_days=7 }'
  )
}}

Additional context

This feature is BigQuery-specific.

Who will this benefit?

This feature will benefit anyone who use BigQuery partition tables, and users of BigQuery in general.

The text was updated successfully, but these errors were encountered:

drewbanin · 2019-10-19T20:29:08Z

cool idea @hui-zheng! See also #1829 -- both of these issues require dbt to set options appropriately in the create table as statements that are run for table/incremental models.

I think we could support this pretty readily:

Add require_partition_filter as a "known" config in the plugin code
Add some logic in the bigquery specific plugin macros which checks for a require_partition_filter config, adding it to the list of supplied options

And that's it! What do you think about that?

hui-zheng · 2019-10-19T21:29:46Z

yes, I agree with this approach.

soltanianalytics · 2019-10-28T16:23:46Z

I've just done a minimal-effort implementation by overwriting the essential macros in my own macros folder and just want to add what I found out on the way: You will also need to touch the merge statements as they will not execute with require_partition_filter=true unless you specify the filter in the on statement of the merge. In other words, my common_get_merge_sql now looks something like this:

{% macro common_get_merge_sql(target, source, unique_key, dest_columns) -%}
...
    merge into {{ target }} as DBT_INTERNAL_DEST
    using {{ source }} as DBT_INTERNAL_SOURCE
    {% if unique_key %}
        on DBT_INTERNAL_SOURCE.{{ unique_key }} = DBT_INTERNAL_DEST.{{ unique_key }}
        {# The if statement below is new #}
        {% if config.get('part_filter_required', True) %}
          and DBT_INTERNAL_DEST.{{ config.get('part_date_column', default='time') }} > {{some_logic}}
        {% endif %}
    {% else %}
        on FALSE
    {% endif %}
...
{%- endmacro %}

This is (purposefully) hacky but works for me, I'm not recommending going this route, just wanted to make you aware of the limitation. Note that you will also need to supply the argument as a literal, something like SELECT MAX(column) FROM {{this}} as {{some_logic}} is not going to work. I solved it by calling a macro which calls statements which get me a datetime literal that works for my purposes.

The compiled and ready to run sql then looks something like this:

    merge into `[project]`.`[dataset].`[relation]` as DBT_INTERNAL_DEST
    using (
SELECT
	GENERATE_UUID() AS id
	, time
...
FROM `[project]`.`[source_schema]`.`[source_relation]`
WHERE time > '2019-10-26 20:55:40' 
GROUP BY
    2, ...
    ) as DBT_INTERNAL_SOURCE
        on DBT_INTERNAL_SOURCE.id = DBT_INTERNAL_DEST.id
          AND DBT_INTERNAL_DEST.time > '2019-10-26 20:55:40'
    when matched then update set
        id = DBT_INTERNAL_SOURCE.id,time = DBT_INTERNAL_SOURCE.time,...
    when not matched then insert
        (id, time, ...)
    values
        (id, time, ...)

Edit 2019-11-04: I wanted to add that if the merge joins on FALSE because there is no unique_key specified, there is also no need to include the partition column in the on statement. In other words, this would also work:

merge into `[project]`.`[dataset].`[relation]` as DBT_INTERNAL_DEST
    using (
SELECT
	GENERATE_UUID() AS id
	, time
...
FROM `[project]`.`[source_schema]`.`[source_relation]`
WHERE time > '2019-10-26 20:55:40' 
GROUP BY
    2, ...
    ) as DBT_INTERNAL_SOURCE
        on FALSE
    when matched then update set
        id = DBT_INTERNAL_SOURCE.id,time = DBT_INTERNAL_SOURCE.time,...
    when not matched then insert
        (id, time, ...)
    values
        (id, time, ...)

So the one thing that doesn't work is having both a unique_key and required partitioning filter.

drewbanin · 2019-10-29T01:44:16Z

@soltanianalytics thanks so much for the heads up! The logic you implemented reminds me a lot of #1034. I think this is a problem we'll generally need to solve on BQ sometime soon - it's great to know that these two pieces of functionality are related.

drewbanin · 2020-06-03T13:39:27Z

We changed some of the relevant code around. The eventual implementation will need to:

Add an adapter-specific config here
Return the require_partiton_filter option for table creation here

profoundpanda · 2020-07-09T14:38:44Z

Hey @drewbanin just wanted to check if this something that will be supported sometime soon? Would be a super useful config variable to have available.

jtcohen6 · 2020-07-09T15:34:34Z

Agree that this would be a great first issue for an external contributor!

One small thing to note about tables that require partition filters, very much outside the scope of the actual code change required here:

Built-in schema tests will not work on these models. You'll need to instead use tests like unique_where, not_null_where, relationships_where. It could be cool to play around with behavior like:

models:
  - name: my_model
    columns:
      - name: id
        tests:
          - dbt_utils.unique_where:
              where: "{{ config.get('partition_by') ~ '= current_date' if config.get('partition_by_required') else '1=1' }}"

yu-iskw · 2020-12-03T04:50:43Z

If no one hasn't worked on it yet, I will send a PR.

hui-zheng added enhancement New feature or request triage labels Oct 19, 2019

drewbanin added bigquery and removed triage labels Oct 19, 2019

drewbanin added the good_first_issue Straightforward + self-contained changes, good for new contributors! label Jul 9, 2020

jtcohen6 mentioned this issue Jul 17, 2020

Make require_partition_filter a config option for BQ models #2632

Closed

yu-iskw mentioned this issue Dec 3, 2020

Support require_partition_filter and partition_expiration_days in BQ #2928

Merged

4 tasks

jtcohen6 added this to the Margaret Mead milestone Jan 19, 2021

jtcohen6 closed this as completed in #2928 Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow require_partition_filter=true for BigQuery table creation #1843

Allow require_partition_filter=true for BigQuery table creation #1843

hui-zheng commented Oct 19, 2019 •

edited

Loading

drewbanin commented Oct 19, 2019 •

edited

Loading

hui-zheng commented Oct 19, 2019

soltanianalytics commented Oct 28, 2019 •

edited

Loading

drewbanin commented Oct 29, 2019

drewbanin commented Jun 3, 2020

profoundpanda commented Jul 9, 2020

jtcohen6 commented Jul 9, 2020

yu-iskw commented Dec 3, 2020 •

edited

Loading

Allow require_partition_filter=true for BigQuery table creation #1843

Allow require_partition_filter=true for BigQuery table creation #1843

Comments

hui-zheng commented Oct 19, 2019 • edited Loading

Describe the feature

Additional context

Who will this benefit?

drewbanin commented Oct 19, 2019 • edited Loading

hui-zheng commented Oct 19, 2019

soltanianalytics commented Oct 28, 2019 • edited Loading

drewbanin commented Oct 29, 2019

drewbanin commented Jun 3, 2020

profoundpanda commented Jul 9, 2020

jtcohen6 commented Jul 9, 2020

yu-iskw commented Dec 3, 2020 • edited Loading

hui-zheng commented Oct 19, 2019 •

edited

Loading

drewbanin commented Oct 19, 2019 •

edited

Loading

soltanianalytics commented Oct 28, 2019 •

edited

Loading

yu-iskw commented Dec 3, 2020 •

edited

Loading