Snowflake: no transactions, except for DML #3510

jtcohen6 · 2021-06-29T20:43:02Z

resolves #2748 resolves #3480

Never run begin, commit, or rollback on Snowflake
Set autcommit=True in all connections
Wrap DML statements in explicit begin + commit, affecting the following macros: snowflake__get_merge_sql, snowflake__get_delete_insert_merge_sql, snowflake__snapshot_merge_sql
Consolidate run_hooks in the table and incremental materializations—there are no inside/outside-transactional hooks, since there are no more transactions.

Questions

❓ For simplicity, I wrote a macro snowflake_dml_explicit_transaction that wraps supplied DML in begin; + commit;. Should this be an adapter method instead, written in python?
❓ During incremental runs, there is technically a transaction, since we're wrapping DML inside begin + commit. Users can still run within-transaction "pre-hook" logic via sql_header. Do we need to preserve a way to run within-transaction "post-hook" logic? This isn't something I see come up nearly as often.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

jtcohen6 · 2021-06-29T23:05:01Z

I wasn't expecting all the integration tests to pass on the first go. Neat!

@drewbanin Could I tag you in for a review here? I think you're one of the few folks who has the full history of dbt x Snowflake x transactional logic. To be clear, the changes in this PR are based entirely on the explicit recommendations from Snowflake's partner solutions architect, whom we met with today: #3480 (comment) (cc @amychen1776)

drewbanin

@jtcohen6 I took a quick scan through here - really shocked at how small the diff is! That's great.

Some of this is germane to this PR, some of it is not, but do want to get the thoughts in my head out into the world:

IIRC the primary use-case for specifying pre/post hooks outside of a transaction is on Redshift where operations like vacuums could not be specified inside of a transaction. I think it's a-ok if we drop that distinction on Snowflake
The fact that we can create or replace table|view on Snowflake simplifies this problem significantly, I think. If Snowflake didn't have this construct, then I would be pretty worried about non-atomicity when dbt replaces a table or view.
- In a similar vein, I am wondering about what happens when we replace a table with a view, or vice-versa. Is this handled in an atomic way for an external querier? Eg. if a BI tool issues a query between when a view is dropped and a table is recreated.. will that query fail? If I'm understanding the Snowflake docs correctly, I think the answer is YES
I wonder if issuing fewer explicit transactions is going to have an impact on dbt project runtime. I guess the latency associated with issuing those queries is amortized on big projects, but I would not be shocked if there is a positive impact on performance overall
I am glad to hear that the integration tests are all passing! My instinct is that we're not testing for atomicity on Snowflake in the same way that we do/did test on Redshift where this was a real problem for us historically. Can we imagine stress-testing this branch with a dbt running + concurrent queries against the tables being replaced? I'm happy to help with that if you think it's interesting

Kind of scattershot, but curious what you think! I am happy to slap a LGTM here since this is ultimately the kind of thing we'll want to extensively test internally/in a beta, but would also like to discuss some of these things first :)

drewbanin · 2021-06-30T00:32:48Z

plugins/snowflake/dbt/include/snowflake/macros/adapters.sql

+
+ {% set dml_transaction %}
+ begin;
+ {{ dml }} {{ ";" if not (dml | trim).endswith(";") }}


This is really clever... worried that it might be too clever! If we wanted to be extra safe, we could issue three distinct queries for begin, the query, then commit. Would that defeat the purpose of these changes?

If we know that the supplied dml is always going to come from dbt internally, then this can be ok, but my preference would actually be either to:

Specify the semicolon in the caller OR

Always inject the semicolon here

you buy it?

I thought about the latter option, as a new argument for the statement macro: perhaps explicit_transaction=False. It's conceptually related to the existing argument auto_begin=True (which won't take any effect on Snowflake, given the changes in this PR).

Since a lot of this DML is not called directly, however, but rather passed into the materialization's main statement—which could be DML, could be DDL—this would require a bit more plumbing work.

So I'm happy with mandating that SQL passed into this macro shouldn't end in a semicolon, and have the macro always stick it on the end.

drewbanin · 2021-06-30T00:34:24Z

plugins/snowflake/dbt/include/snowflake/macros/materializations/merge.sql

+
+{% macro snowflake__get_delete_insert_merge_sql(target, source, unique_key, dest_columns) %}
+ {% set dml = default__get_delete_insert_merge_sql(target, source, unique_key, dest_columns) %}
+ {% do return(snowflake_dml_explicit_transaction(dml)) %}


is there any problem with specifying code with an explicit begin/commit in addition to enabling autocommit? Surprised to see that this works! I would have expected that we would need to disable autocommit for these queries

This is one of the things that's tricky about autocommit. You don't actually need to disable it, it just doesn't take effect if you supply explicit begin + commit. From docs:

Statements inside an explicit transaction are not affected by AUTOCOMMIT. For example, statements inside an explicit BEGIN TRANSACTION ... ROLLBACK are rolled back even if AUTOCOMMIT is TRUE.

jtcohen6

@drewbanin Thanks for the quick review!

Is this handled in an atomic way for an external querier? Eg. if a BI tool issues a query between when a view is dropped and a table is recreated.. will that query fail? If I'm understanding the Snowflake docs correctly, I think the answer is YES

Yes—and I think this is the status quo behavior. When we switched to using create or replace table|view on Snowflake, whenever the old relation is of the wrong type, we simply drop it before creating it as the correct type, at risk of downtime and inability to rollback.

This did remind me to update the create_or_replace_view() macro as well, since we'd hacked that to support both transactions (on Snowflake) and not-transactions (on BigQuery). Now it can be even simpler.

Can we imagine stress-testing this branch with a dbt running + concurrent queries against the tables being replaced? I'm happy to help with that if you think it's interesting

I did some very low-key stress testing by creating 10 models, and querying them all unioned together in the Snowflake console while I simultaneously executed dbt run. Suffice to say, it works just fine when dbt is replacing views with views, or tables with tables. There's a few seconds of downtime if we're switching between views and tables. I made sure there's debug logging to reflect that.

We could take the BigQuery approach, which is to require that the user specify --full-refresh in order to drop a table and replace it with a view. (We don't require it for view → table, since that's presumably an "upgrade.")

this is ultimately the kind of thing we'll want to extensively test internally/in a beta

100% agree. I'd like to get this into v0.21.0-b1, to go out end of this week or early next week, and get it into as many beta testers' hands as possible, as early as possible.

jtcohen6 · 2021-07-11T22:23:15Z

plugins/snowflake/dbt/include/snowflake/macros/materializations/merge.sql

+
+{% macro snowflake__get_delete_insert_merge_sql(target, source, unique_key, dest_columns) %}
+ {% set dml = default__get_delete_insert_merge_sql(target, source, unique_key, dest_columns) %}
+ {% do return(snowflake_dml_explicit_transaction(dml)) %}


This is one of the things that's tricky about autocommit. You don't actually need to disable it, it just doesn't take effect if you supply explicit begin + commit. From docs:

Statements inside an explicit transaction are not affected by AUTOCOMMIT. For example, statements inside an explicit BEGIN TRANSACTION ... ROLLBACK are rolled back even if AUTOCOMMIT is TRUE.

jtcohen6 · 2021-07-11T22:34:09Z

plugins/snowflake/dbt/include/snowflake/macros/adapters.sql

+
+ {% set dml_transaction %}
+ begin;
+ {{ dml }} {{ ";" if not (dml | trim).endswith(";") }}


I thought about the latter option, as a new argument for the statement macro: perhaps explicit_transaction=False. It's conceptually related to the existing argument auto_begin=True (which won't take any effect on Snowflake, given the changes in this PR).

Since a lot of this DML is not called directly, however, but rather passed into the materialization's main statement—which could be DML, could be DDL—this would require a bit more plumbing work.

So I'm happy with mandating that SQL passed into this macro shouldn't end in a semicolon, and have the macro always stick it on the end.

cla-bot bot added the cla:yes label Jun 29, 2021

jtcohen6 temporarily deployed to Postgres June 29, 2021 20:47 Inactive

jtcohen6 temporarily deployed to Redshift June 29, 2021 20:47 Inactive

jtcohen6 temporarily deployed to Bigquery June 29, 2021 20:47 Inactive

jtcohen6 temporarily deployed to Snowflake June 29, 2021 20:47 Inactive

jtcohen6 marked this pull request as ready for review June 29, 2021 23:05

drewbanin reviewed Jun 30, 2021

View reviewed changes

jtcohen6 force-pushed the feature/snowflake-disable-transactions branch from d33e1b3 to aa6b79b Compare July 11, 2021 22:46

jtcohen6 commented Jul 11, 2021

View reviewed changes

jtcohen6 had a problem deploying to Postgres July 11, 2021 22:50 Failure

jtcohen6 temporarily deployed to Bigquery July 11, 2021 22:50 Inactive

jtcohen6 temporarily deployed to Snowflake July 11, 2021 22:50 Inactive

jtcohen6 temporarily deployed to Redshift July 11, 2021 22:50 Inactive

jtcohen6 force-pushed the feature/snowflake-disable-transactions branch from aa6b79b to e5ce556 Compare July 21, 2021 21:00

jtcohen6 had a problem deploying to Postgres July 21, 2021 21:00 Failure

jtcohen6 temporarily deployed to Bigquery July 21, 2021 21:00 Inactive

jtcohen6 temporarily deployed to Snowflake July 21, 2021 21:00 Inactive

jtcohen6 temporarily deployed to Redshift July 21, 2021 21:00 Inactive

jtcohen6 force-pushed the feature/snowflake-disable-transactions branch from e5ce556 to 3fa9b97 Compare July 21, 2021 21:15

jtcohen6 had a problem deploying to Postgres July 21, 2021 21:15 Failure

jtcohen6 temporarily deployed to Bigquery July 27, 2021 17:50 Inactive

Fix unit test

7b4e7ef

jtcohen6 temporarily deployed to Postgres July 27, 2021 18:04 Inactive

jtcohen6 temporarily deployed to Bigquery July 27, 2021 18:04 Inactive

jtcohen6 temporarily deployed to Redshift July 27, 2021 18:04 Inactive

jtcohen6 had a problem deploying to Snowflake July 27, 2021 18:04 Failure

jtcohen6 had a problem deploying to Snowflake July 27, 2021 18:04 Error

One semicolon can change the world

6d9f8fd

jtcohen6 temporarily deployed to Postgres July 27, 2021 19:07 Inactive

jtcohen6 temporarily deployed to Snowflake July 27, 2021 19:07 Inactive

jtcohen6 temporarily deployed to Redshift July 27, 2021 19:07 Inactive

jtcohen6 temporarily deployed to Bigquery July 27, 2021 19:07 Inactive

jtcohen6 requested a review from leahwicz July 27, 2021 19:20

leahwicz approved these changes Jul 27, 2021

View reviewed changes

jtcohen6 merged commit 792f39a into develop Jul 27, 2021

jtcohen6 deleted the feature/snowflake-disable-transactions branch July 27, 2021 22:13

jtcohen6 mentioned this pull request Jul 28, 2021

Disable most transactional logic on Snowflake #3480

Closed

jtcohen6 mentioned this pull request Sep 23, 2021

Seeding big files fail on 0.21.0-rc1 #3941

Closed

5 tasks

dataders mentioned this pull request Nov 5, 2021

small changes for 0.21.0 dbt-msft/dbt-sqlserver#168

Merged

jtcohen6 mentioned this pull request Feb 14, 2022

[CT-227] [Bug] Hooks not executed from version 0.21 and onwards #4721

Closed

1 task

jtcohen6 mentioned this pull request Mar 23, 2022

[CT-242] Release notes for v0.21.0 misleading #4732

Closed

adamantike mentioned this pull request Apr 19, 2022

[CT-368] Race conditions in dbt seed. dbt-labs/dbt-snowflake#112

Open

jtcohen6 mentioned this pull request Apr 26, 2022

[CT-552] [Bug] Rows_affected showing as 1 in run_results.json and in UI dbt-labs/dbt-snowflake#147

Closed

1 task

HassanJaveed84 mentioned this pull request Jul 28, 2022

chore: wrap truncate/insert in a transaction. edx/snowflake_timetravel_table#5

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snowflake: no transactions, except for DML #3510

Snowflake: no transactions, except for DML #3510

jtcohen6 commented Jun 29, 2021 •

edited by dbeatty10

Loading

jtcohen6 commented Jun 29, 2021

drewbanin left a comment

drewbanin Jun 30, 2021

jtcohen6 Jul 11, 2021

drewbanin Jun 30, 2021

jtcohen6 Jul 11, 2021

jtcohen6 left a comment

jtcohen6 Jul 11, 2021

jtcohen6 Jul 11, 2021

Snowflake: no transactions, except for DML #3510

Snowflake: no transactions, except for DML #3510

Conversation

jtcohen6 commented Jun 29, 2021 • edited by dbeatty10 Loading

Questions

Checklist

jtcohen6 commented Jun 29, 2021

drewbanin left a comment

Choose a reason for hiding this comment

drewbanin Jun 30, 2021

Choose a reason for hiding this comment

jtcohen6 Jul 11, 2021

Choose a reason for hiding this comment

drewbanin Jun 30, 2021

Choose a reason for hiding this comment

jtcohen6 Jul 11, 2021

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 Jul 11, 2021

Choose a reason for hiding this comment

jtcohen6 Jul 11, 2021

Choose a reason for hiding this comment

jtcohen6 commented Jun 29, 2021 •

edited by dbeatty10

Loading