Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-674] [Bug] dbt_tmp is created twice with copy_partitions:true #810

Closed
2 tasks done
dongchris opened this issue Jul 7, 2023 · 4 comments · Fixed by #845
Closed
2 tasks done

[ADAP-674] [Bug] dbt_tmp is created twice with copy_partitions:true #810

dongchris opened this issue Jul 7, 2023 · 4 comments · Fixed by #845
Labels
bug Something isn't working incremental

Comments

@dongchris
Copy link

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

when using copy_partitions: true in partition_by parameter, it is creating dbt_tmp table twice before performing copy step

Expected Behavior

it should only run dbt_tmp once

Steps To Reproduce

model: test

  config(
    materialized='incremental',
    partition_by={
      "field": 'day_pt',
      "data_type": "datetime",
      "granularity": 'day',
      "copy_partitions": true
    }
  )
}}


SELECT
  DATETIME('2023-01-01') AS day_pt

first run: dbt run --select test

create or replace table `discord-pada-reporting`.`dbt_testing`.`test__chris_dong`
    partition by datetime_trunc(day_pt, day)


    OPTIONS(
      description=""""""
    )
    as (


SELECT
  DATETIME('2023-01-01') AS day_pt
    );

second run: dbt run --select test

  create or replace table `discord-pada-reporting`.`dbt_testing`.`test__chris_dong__dbt_tmp`
    partition by datetime_trunc(day_pt, day)


    OPTIONS(
      description="""""",

      expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)
    )
    as (


SELECT
  DATETIME('2023-01-01') AS day_pt
    );

is executed twice

Relevant log output

No response

Environment

- OS:
- Python:
- dbt-core: 1.5.0
- dbt-bigquery: 1.5.0

Additional Context

No response

@dongchris dongchris added bug Something isn't working triage labels Jul 7, 2023
@github-actions github-actions bot changed the title [Bug] dbt_tmp is created twice with copy_partitions:true [ADAP-674] [Bug] dbt_tmp is created twice with copy_partitions:true Jul 7, 2023
@github-christophe-oudar
Copy link
Contributor

Related Slack thread: https://getdbt.slack.com/archives/C99SNSRTK/p1688687595877169
I couldn't reproduce it (but with an up to date version)

@damiaomartins
Copy link

I found that the macro bq_create_table_as is called twice with the combination of the parameters copy_partitions=true and on_schema_change != ignore, causing the bug:

Steps to reproduce

Model: incremental_test.sql

{{ config(
    materialized="incremental",
    incremental_strategy="insert_overwrite",
    partition_by={
      "field": "_partition",
      "granularity": "day",
      "data_type": "timestamp",
      "time_ingestion_partitioning": True,
      "copy_partitions": True,
    },
    on_schema_change='append_new_columns'
) }}

SELECT
    timestamp_trunc(CURRENT_TIMESTAMP(), day) as _partition,
    'some value' as col1

Relevant output

From the second execution onwards, the query will execute twice in every dbt run:

22:06:15.435313 [debug] [Thread-1  ]: On model.datalake_gcp.incremental_test: /* {"app": "dbt", "dbt_version": "1.4.6", "profile_name": "datalake_gcp", "target_name": "qa", "node_id": "model.datalake_gcp.incremental_test"} */
insert into `<project>`.`dbt_damiao`.`incremental_test__dbt_tmp` (_partitiontime, `col1`)
    

  select _partition as _partitiontime, * EXCEPT(_partition) from (
    

SELECT
    timestamp_trunc(CURRENT_TIMESTAMP(), day) as _partition,
    'some value' as col1
  );
22:06:18.885593 [debug] [Thread-1  ]: BigQuery adapter: https://console.cloud.google.com/bigquery?project=<project>&j=bq:US:**JOB_ID_1**&page=queryresults
[...]
22:06:21.718643 [debug] [Thread-1  ]: On model.datalake_gcp.incremental_test: /* {"app": "dbt", "dbt_version": "1.4.6", "profile_name": "datalake_gcp", "target_name": "qa", "node_id": "model.datalake_gcp.incremental_test"} */
insert into `<project>`.`dbt_damiao`.`incremental_test__dbt_tmp` (_partitiontime, `col1`)
    

  select _partition as _partitiontime, * EXCEPT(_partition) from (
    

SELECT
    timestamp_trunc(CURRENT_TIMESTAMP(), day) as _partition,
    'some value' as col1
  );
22:06:24.818916 [debug] [Thread-1  ]: BigQuery adapter: https://console.cloud.google.com/bigquery?project=<project>&j=bq:US:**JOB_ID_2**&page=queryresults

@github-christophe-oudar
Copy link
Contributor

@damiaomartins, I managed reproduced the issue with your latest example and fix the issue locally (your pointers were helpful 👍 ). I opened a PR.
@dongchris thanks for pushing forward the issue, I missed the github notification on the topic.

@github-christophe-oudar
Copy link
Contributor

FYI till this is released, just copy following macro as a macro in your dbt project to override the bugged one:

{#
    Temporary fix as there is a bug that duplicate the tmp table
    follow https://github.com/dbt-labs/dbt-bigquery/pull/845
#}
{% macro bq_dynamic_copy_partitions_insert_overwrite_sql(
  tmp_relation, target_relation, sql, unique_key, partition_by, dest_columns, tmp_relation_exists, copy_partitions
  ) %}
  {%- if tmp_relation_exists is false -%}
  {# We run temp table creation in a separated script to move to partitions copy if it doesn't already exist #}
    {%- call statement('create_tmp_relation_for_copy', language='sql') -%}
      {{ bq_create_table_as(partition_by, True, tmp_relation, sql, 'sql')
    }}
    {%- endcall %}
  {%- endif -%}
  {%- set partitions_sql -%}
    select distinct {{ partition_by.render_wrapped() }}
    from {{ tmp_relation }}
  {%- endset -%}
  {%- set partitions = run_query(partitions_sql).columns[0].values() -%}
  {# We copy the partitions #}
  {%- do bq_copy_partitions(tmp_relation, target_relation, partitions, partition_by) -%}
  -- Clean up the temp table
  drop table if exists {{ tmp_relation }}
{% endmacro %}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working incremental
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants