Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement non-partitioned tmp table #405

Merged

Conversation

svdimchenko
Copy link
Contributor

@svdimchenko svdimchenko commented Sep 8, 2023

Description

Resolves: #396

Currently we have partitioned temp table creation for table and increment materialisation.
However this can lead to over price and over resources consumption as model's sql can be complicated. So that batches split run on this complicated sql may be not the very optimal way.

This PR implements the logic when we skip partitioning config while creating new tmp table and then we calculate batches and run insert statement based on this non-partitioned table.

Models used to test

table iceberg

{{
  config(
    schema='test_schema',
    materialized='table',
    partitioned_by=['DAY(date_column)', 'doy'],
    table_type='iceberg'
  )
}}

SELECT
    CAST(date_column AS DATE) as date_column,
    doy(date_column) as doy,
    random() as rnd
FROM (
    VALUES (
        SEQUENCE(FROM_ISO8601_DATE('2020-01-01'), FROM_ISO8601_DATE('2023-07-24'), INTERVAL '1' DAY)
    )
) AS t1(date_array)
CROSS JOIN UNNEST(date_array) AS t2(date_column)

increment iceberg

{{
  config(
    schema='test_schema',
    materialized='incremental',
    incremental_strategy='merge',
    partitioned_by=['DAY(date_column)', 'doy'],
    table_type='iceberg',
    unique_key=['doy']
  )
}}

select * from {{ ref('test_dbt_table') }}

table hive

{{
  config(
    schema='test_schema',
    materialized='table',
    partitioned_by=['date_column', 'doy'],
    table_type='hive'
  )
}}

SELECT
    random() as rnd,
    CAST(date_column AS DATE) as date_column,
    doy(date_column) as doy
FROM (
    VALUES (
        SEQUENCE(FROM_ISO8601_DATE('2020-01-01'), FROM_ISO8601_DATE('2023-07-24'), INTERVAL '1' DAY)
    )
) AS t1(date_array)
CROSS JOIN UNNEST(date_array) AS t2(date_column)

Checklist

  • You followed contributing section
  • You kept your Pull Request small and focused on a single feature or bug fix.
  • You added unit testing when necessary
  • You added functional testing when necessary

@nicor88
Copy link
Contributor

nicor88 commented Sep 8, 2023

Great work 💯 - our functional tests should cover this implementation already I believe and they seems ✅

@svdimchenko svdimchenko merged commit 6ef6c97 into dbt-labs:main Sep 8, 2023
@svdimchenko svdimchenko deleted the feat/non-partitioned-tmp-table branch October 1, 2024 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Materialize sql to tmp table in case we hit partitions limitis
2 participants