feat: fix athena partitions limit #360

svdimchenko · 2023-07-28T09:23:13Z

Description

Resolves: #87

Models used to test

For table materialization (iceberg):

{{
  config(
    schema='sandbox',
    materialized='table',
    partitioned_by=['DAY(date_column)', 'doy'],
    table_type='iceberg'
  )
}}

SELECT
    CAST(date_column AS DATE) as date_column,
    doy(date_column) as doy,
    random() as rnd
FROM (
    VALUES (
        SEQUENCE(FROM_ISO8601_DATE('2023-01-01'), FROM_ISO8601_DATE('2023-07-24'), INTERVAL '1' DAY)
    )
) AS t1(date_array)
CROSS JOIN UNNEST(date_array) AS t2(date_column)

For table materialization (hive):

{{
  config(
    schema='sandbox',
    materialized='table',
    partitioned_by=['date_column', 'doy'],
    table_type='hive'
  )
}}

SELECT
    random() as rnd,
    CAST(date_column AS DATE) as date_column,
    doy(date_column) as doy
FROM (
    VALUES (
        SEQUENCE(FROM_ISO8601_DATE('2020-01-01'), FROM_ISO8601_DATE('2023-07-24'), INTERVAL '1' DAY)
    )
) AS t1(date_array)
CROSS JOIN UNNEST(date_array) AS t2(date_column)

For incremental materialization:

{{
  config(
    schema='sandbox',
    materialized='incremental',
    incremental_strategy='merge',
    partitioned_by=['DAY(date_column)', 'doy'],
    table_type='iceberg',
    unique_key=['doy']
  )
}}

select * from {{ ref('mat_table') }}

Checklist

You followed contributing section
You kept your Pull Request small and focused on a single feature or bug fix.
You added unit testing when necessary
You added functional testing when necessary

Jrmyy

Amazing 🔥

dbt/adapters/athena/impl.py

Jrmyy · 2023-08-01T14:25:58Z

How is handled when we want to redo a certain batch ? Is there a way to delete overlapping partitions as it was done in the insert_by_period made by Jesse ?
Especially I don't see examples with is_incremental condition.

svdimchenko · 2023-08-03T09:41:04Z

How is handled when we want to redo a certain batch ? Is there a way to delete overlapping partitions as it was done in the insert_by_period made by Jesse ? Especially I don't see examples with is_incremental condition.

@Jrmyy all I've changed the way how the tmp table is created and how the data from it is inserted into target relation. All other logic including is_incremental condition and delete overlapping partitions stay as they were before. I'm now testing these changes on my own project (only iceberg tables to be honest) and everything works OK. But of course, I would appreciate help with testing on different models to catch possible edge cases.

dbt/include/athena/macros/materializations/models/table/create_table_as.sql

nicor88

@svdimchenko this is an amazing piece of work 💯
I left an improvement comment - also do you think that we can add some integration tests where we test that the actually written data is correct:

integrity - e.g. the dataset return by sql mode match what we have in the final table
covered by the integrity check, but maybe a check to see that all the values of the partitions are in the actual final table

nicor88 · 2023-08-14T17:32:27Z

@svdimchenko I did some more tests - let's fix the conflicts and merge ok? - I'm looking forward to having this merge before we release 1.6 :)

nicor88

💯

antonysouthworth-halter · 2023-09-25T02:02:27Z

hey does this work for (non-iceberg) tables that are bucketed in addition to partitioned?

nicor88 · 2023-09-25T07:09:08Z

AFIK for bucketing (hive and not hive) you won't get any partition limitation. Did you face partition limitations with bucketing?
Bucketing allows you to write objects with the same hash value in the same object, it does not create any prefix.

svdimchenko changed the title ~~[feat] Fix athena partitions limit~~ feat: fix athena partitions limit Jul 28, 2023

svdimchenko marked this pull request as ready for review July 28, 2023 16:12

svdimchenko requested review from jessedobbelaere, Jrmyy, mattiamatrix, nicor88 and thenaturalist as code owners July 28, 2023 16:12

Implement ddl strategies to work with 100 partitions limit

3d39753

Jrmyy previously approved these changes Jul 31, 2023

View reviewed changes

Jrmyy added the enable-functional-tests Label to trigger functional testing label Jul 31, 2023

Revert minor config changes

31313a5

svdimchenko dismissed Jrmyy’s stale review via 31313a5 July 31, 2023 14:38

Jrmyy reviewed Aug 1, 2023

View reviewed changes

dbt/adapters/athena/impl.py Outdated Show resolved Hide resolved

svdimchenko and others added 3 commits August 3, 2023 22:37

Merge branch 'main' into feat/athena-partitions-limit

af8c9c7

Fix run query method name

829a9ff

Merge branch 'main' into feat/athena-partitions-limit

96c0aa3

nicor88 requested a review from Jrmyy August 8, 2023 14:52

Merge branch 'main' into feat/athena-partitions-limit

14dfeab

nicor88 reviewed Aug 11, 2023

View reviewed changes

dbt/include/athena/macros/materializations/models/table/create_table_as.sql Outdated Show resolved Hide resolved

nicor88 reviewed Aug 11, 2023

View reviewed changes

Merge branch 'main' into feat/athena-partitions-limit

314ba81

nicor88 self-requested a review August 14, 2023 17:49

svdimchenko added 2 commits August 14, 2023 21:17

Add columns for insert statement

7ceb41e

Minor format fix

a9cccac

nicor88 approved these changes Aug 14, 2023

View reviewed changes

nicor88 merged commit ea3cd1d into dbt-labs:main Aug 15, 2023

nicor88 mentioned this pull request Aug 17, 2023

Fix query statistics returned by the CLI and in run_results.json #374

Closed

igoichuk mentioned this pull request Aug 24, 2023

How to materialise injected partition projections? #386

Open

lukealexmiller mentioned this pull request Oct 23, 2023

Incremental strategy inefficient/fails for very large tables #471

Closed

mrshu mentioned this pull request Dec 4, 2023

Athena partitions limit fix (#360) fails with partitions defined as non-Athena functions #529

Closed

nicor88 mentioned this pull request Feb 23, 2024

[Bug] adapter response return incorrect data_scanned_in_bytes when incremental model is running #585

Open

2 tasks

gontzalm mentioned this pull request Jul 31, 2024

Add default converter for timestamp with timezone laughingman7743/PyAthena#554

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fix athena partitions limit #360

feat: fix athena partitions limit #360

svdimchenko commented Jul 28, 2023 •

edited

Loading

Jrmyy left a comment

Jrmyy commented Aug 1, 2023 •

edited

Loading

svdimchenko commented Aug 3, 2023 •

edited

Loading

nicor88 left a comment •

edited

Loading

nicor88 commented Aug 14, 2023

nicor88 left a comment

antonysouthworth-halter commented Sep 25, 2023

nicor88 commented Sep 25, 2023 •

edited

Loading

feat: fix athena partitions limit #360

feat: fix athena partitions limit #360

Conversation

svdimchenko commented Jul 28, 2023 • edited Loading

Description

Models used to test

Checklist

Jrmyy left a comment

Choose a reason for hiding this comment

Jrmyy commented Aug 1, 2023 • edited Loading

svdimchenko commented Aug 3, 2023 • edited Loading

nicor88 left a comment • edited Loading

Choose a reason for hiding this comment

nicor88 commented Aug 14, 2023

nicor88 left a comment

Choose a reason for hiding this comment

antonysouthworth-halter commented Sep 25, 2023

nicor88 commented Sep 25, 2023 • edited Loading

svdimchenko commented Jul 28, 2023 •

edited

Loading

Jrmyy commented Aug 1, 2023 •

edited

Loading

svdimchenko commented Aug 3, 2023 •

edited

Loading

nicor88 left a comment •

edited

Loading

nicor88 commented Sep 25, 2023 •

edited

Loading