feat: Implement non-partitioned tmp table #405

svdimchenko · 2023-09-08T12:33:15Z

Description

Resolves: #396

Currently we have partitioned temp table creation for table and increment materialisation.
However this can lead to over price and over resources consumption as model's sql can be complicated. So that batches split run on this complicated sql may be not the very optimal way.

This PR implements the logic when we skip partitioning config while creating new tmp table and then we calculate batches and run insert statement based on this non-partitioned table.

Models used to test

table iceberg

{{
  config(
    schema='test_schema',
    materialized='table',
    partitioned_by=['DAY(date_column)', 'doy'],
    table_type='iceberg'
  )
}}

SELECT
    CAST(date_column AS DATE) as date_column,
    doy(date_column) as doy,
    random() as rnd
FROM (
    VALUES (
        SEQUENCE(FROM_ISO8601_DATE('2020-01-01'), FROM_ISO8601_DATE('2023-07-24'), INTERVAL '1' DAY)
    )
) AS t1(date_array)
CROSS JOIN UNNEST(date_array) AS t2(date_column)

increment iceberg

{{
  config(
    schema='test_schema',
    materialized='incremental',
    incremental_strategy='merge',
    partitioned_by=['DAY(date_column)', 'doy'],
    table_type='iceberg',
    unique_key=['doy']
  )
}}

select * from {{ ref('test_dbt_table') }}

table hive

{{
  config(
    schema='test_schema',
    materialized='table',
    partitioned_by=['date_column', 'doy'],
    table_type='hive'
  )
}}

SELECT
    random() as rnd,
    CAST(date_column AS DATE) as date_column,
    doy(date_column) as doy
FROM (
    VALUES (
        SEQUENCE(FROM_ISO8601_DATE('2020-01-01'), FROM_ISO8601_DATE('2023-07-24'), INTERVAL '1' DAY)
    )
) AS t1(date_array)
CROSS JOIN UNNEST(date_array) AS t2(date_column)

Checklist

You followed contributing section
You kept your Pull Request small and focused on a single feature or bug fix.
You added unit testing when necessary
You added functional testing when necessary

dbt/include/athena/macros/materializations/models/helpers/get_partition_batches.sql

dbt/include/athena/macros/materializations/models/table/create_table_as.sql

nicor88 · 2023-09-08T13:42:18Z

Great work 💯 - our functional tests should cover this implementation already I believe and they seems ✅

Serhii Dimchenko added 3 commits September 8, 2023 12:11

Implement non-partitioned stg table

f4fa925

Minor fixes

2613e2d

Minor fixes

97d2047

svdimchenko requested review from jessedobbelaere, Jrmyy, mattiamatrix and nicor88 as code owners September 8, 2023 12:33

svdimchenko added the enable-functional-tests label Sep 8, 2023

nicor88 reviewed Sep 8, 2023

View reviewed changes

dbt/include/athena/macros/materializations/models/helpers/get_partition_batches.sql Show resolved Hide resolved

nicor88 reviewed Sep 8, 2023

View reviewed changes

dbt/include/athena/macros/materializations/models/table/create_table_as.sql Outdated Show resolved Hide resolved

Fixed tmp relation naming

9df9990

nicor88 approved these changes Sep 8, 2023

View reviewed changes

svdimchenko merged commit 6ef6c97 into dbt-labs:main Sep 8, 2023

lukealexmiller mentioned this pull request Oct 23, 2023

Incremental strategy inefficient/fails for very large tables #471

Closed

svdimchenko deleted the feat/non-partitioned-tmp-table branch October 1, 2024 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement non-partitioned tmp table #405

feat: Implement non-partitioned tmp table #405

svdimchenko commented Sep 8, 2023 •

edited

Loading

nicor88 commented Sep 8, 2023

feat: Implement non-partitioned tmp table #405

feat: Implement non-partitioned tmp table #405

Conversation

svdimchenko commented Sep 8, 2023 • edited Loading

Description

Models used to test

Checklist

nicor88 commented Sep 8, 2023

svdimchenko commented Sep 8, 2023 •

edited

Loading