replace partitionOverwriteMode inside merge strategy #117

charlottevdscheun · 2020-10-30T14:24:03Z

Problem:
Using incremental materialization with a insert_overwrite strategy without a unique_key doesn't work.

{{
  config(
    materialized='incremental',
    file_format='delta',
    incremental_strategy='insert_overwrite'
  )
}}

Error message
Table charlotte.test does not support dynamic overwrite in batch mode.

Fix
Relocating the "spark.sql.sources.partitionOverwriteMode = DYNAMIC" inside the merge strategy. Because this is only necessary when you give a unique_key, and for insert_overwrite this is not a must, but for merge it is.

jtcohen6 · 2020-10-30T18:14:36Z

Hey @charlottevdscheun, long time! :)

I think the issue here is that your model does not have a partition_by config, rather than lacking a unique_key. The insert_overwrite strategy assumes that you have a partitioned table, that you want to fully replace a handful of partitions, and that Spark will dynamically determine the partitions for replacement based on the values returned by your incremental model SQL. E.g. you have a table partitioned by a date column, you use an is_incremental() filter to limit model results to the last three days, and fully replace the last three days of data.

To my knowledge, running insert overwrite on a non-partitioned table will atomically replace all the contents of the table (docs). That isn't really what we're after with incremental models. Is that what you need for your use case?

charlottevdscheun · 2020-11-02T06:54:40Z

Hee @jtcohen6, yeah I thought hey lets follow in Fokko's footsteps and contribute too ;)

I'll explain my usecase: We have a problem in our project, if our pipeline runs, it drops and recreates the end table. Leading to a 15 minute unavailable table for our model. Thats why we started using the delta file so that we could overwrite the table and it would still be available to our model.

I saw that that function was already available in the incremental materialization, but for it to work without a partition or unique key it doesnt want the set spark.sql.sources.partitionOverwriteMode = DYNAMIC. I completely forgot about the partitioned colloms, but i changed the code to if there is a unique key or partition col then overwrite mode gets turned on.

I think this overwrite strategy maybe could be a strategy function outside the incremental materialization, what do you think?

Fokko

Tnx @charlottevdscheun for fixing this!

dbt/include/spark/macros/materializations/incremental.sql

Fokko's suggestion to remove unique key from the if statement Co-authored-by: Fokko Driesprong <fokko@driesprong.frl>

Fokko

jtcohen6 · 2020-11-02T14:13:47Z

We have a problem in our project, if our pipeline runs, it drops and recreates the end table. Leading to a 15 minute unavailable table for our model. Thats why we started using the delta file so that we could overwrite the table and it would still be available to our model.

Ok, this makes sense! I agree with keeping this approach as part of the incremental materialization, because it still has a crucial limitation of incremental builds: if the column names or data types change, if there are more or fewer columns, the insert overwrite won't work.

I think we'll want to explain this as: "If a partition_by config is not specified, dbt will overwrite the entire table as an atomic operation, replacing it with new data of the same schema. This is analogous to truncate + insert."

@charlottevdscheun Could I trouble you to add that piece to the "Incremental Models" section of the README as part of this PR? I can take care of updating the primary docs before the next dbt-spark release.

…ttevdscheun/dbt-spark into fix/incremental_overwrite

charlottevdscheun · 2020-11-02T14:22:30Z

@jtcohen6 No trouble at all! Thank you for looking at my first PR ;)

replace partitionOverwriteMode inside merge strategy

087a158

dynamic overwrite when partition || unique key

30b2222

Fokko reviewed Nov 2, 2020

View reviewed changes

dbt/include/spark/macros/materializations/incremental.sql Outdated Show resolved Hide resolved

Update dbt/include/spark/macros/materializations/incremental.sql

f240857

Fokko's suggestion to remove unique key from the if statement Co-authored-by: Fokko Driesprong <fokko@driesprong.frl>

Fokko approved these changes Nov 2, 2020

View reviewed changes

charlottevdscheun added 2 commits November 2, 2020 15:19

add documentation about insert_overwrite when no partition

f6d54d7

Merge branch 'fix/incremental_overwrite' of https://github.com/charlo…

a34527a

…ttevdscheun/dbt-spark into fix/incremental_overwrite

jtcohen6 merged commit e7d73ef into dbt-labs:master Nov 5, 2020

Fokko mentioned this pull request Nov 17, 2020

Unable to evolve the schema #124

Closed

This was referenced Jan 11, 2021

SQL Endpoint only supports merge incremental strategy [and still doesn't yet] #138

Closed

Rationalize file formats + incremental strategies #140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace partitionOverwriteMode inside merge strategy #117

replace partitionOverwriteMode inside merge strategy #117

charlottevdscheun commented Oct 30, 2020

jtcohen6 commented Oct 30, 2020

charlottevdscheun commented Nov 2, 2020

Fokko left a comment

Fokko left a comment

jtcohen6 commented Nov 2, 2020

charlottevdscheun commented Nov 2, 2020

replace partitionOverwriteMode inside merge strategy #117

replace partitionOverwriteMode inside merge strategy #117

Conversation

charlottevdscheun commented Oct 30, 2020

jtcohen6 commented Oct 30, 2020

charlottevdscheun commented Nov 2, 2020

Fokko left a comment

Choose a reason for hiding this comment

Fokko left a comment

Choose a reason for hiding this comment

jtcohen6 commented Nov 2, 2020

charlottevdscheun commented Nov 2, 2020