Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable most transactional logic on Snowflake #3480

Closed
jtcohen6 opened this issue Jun 21, 2021 · 2 comments
Closed

Disable most transactional logic on Snowflake #3480

jtcohen6 opened this issue Jun 21, 2021 · 2 comments
Labels
enhancement New feature or request snowflake
Milestone

Comments

@jtcohen6
Copy link
Contributor

Describe the feature

Today, dbt treats Snowflake as it would any transactional database (e.g. Postgres, Redshift): begin + commit sandwich all DDL/DML for materializations, tests, snapshot-freshness queries, etc. This has the effect of running more queries than strictly necessary (#2748). More recently, we've come to understand that transactional logic weights heavily in the Cloud Services layer.

The simplest solution here would be:

  1. Disable transactional logic in Snowflake by default, let DDL/DML statements auto-commit themselves wherever possible
  2. Add explicit begin + commit in only the places where it's needed, i.e. multiple DDL/DML statements that must be executed within a single transaction

Step 1 could be as simple as replicating the code in dbt-bigquery:
https://github.com/fishtown-analytics/dbt/blob/eb46bfc3d6fd70ee5352fa758470da03dc3700d1/plugins/bigquery/dbt/adapters/bigquery/connections.py#L147-L148

https://github.com/fishtown-analytics/dbt/blob/eb46bfc3d6fd70ee5352fa758470da03dc3700d1/plugins/bigquery/dbt/adapters/bigquery/connections.py#L196-L200

My sense is that the code changes for this are actually quite easy. The trickiness comes in considering and testing all possible permutations / edge cases, and validating that Snowflake's empirical behavior in each case matches its documented behavior. To that end, it's worth really digging into how autocommit works on Snowflake (docs), and understanding exactly how dbt interacts with each note below:

Snowflake supports an AUTOCOMMIT parameter. The default setting for AUTOCOMMIT is on.

While AUTOCOMMIT is enabled:

  • Each statement outside an explicit transaction is treated as though it is inside its own implicit single-statement transaction. In other words, that statement is automatically committed if it succeeds, and automatically rolled back if it fails. Statements inside an explicit transaction are not affected by AUTOCOMMIT. For example, statements inside an explicit BEGIN TRANSACTION ... ROLLBACK are rolled back even if AUTOCOMMIT is TRUE.

While AUTOCOMMIT is disabled:

  • An implicit BEGIN TRANSACTION is executed at:

    • The first DML statement or query statement after a transaction ends. This is true regardless of what ended the preceding transaction (e.g. a DDL statement, or an explicit commit or rollback).

    • The first DML statement or query statement after disabling AUTOCOMMIT.

  • An implicit COMMIT is executed at the following (if a transaction is already active):

    • The execution of a DDL statement.

    • The execution of an ALTER SESSION SET AUTOCOMMIT statement, regardless of whether the new value is TRUE or FALSE, and regardless of whether or not the new value is different from the previous value. For example, even if you set AUTOCOMMIT to FALSE when it is already FALSE, an implicit COMMIT is executed.

  • An implicit ROLLBACK is executed at the following (if a transaction is already active):

    • The end of a session.

    • The end of a stored procedure. Regardless of whether the stored procedure’s active transaction was started explicitly or implicitly, Snowflake rolls back the active transaction and issues an error message.

Here's what sticks out to me:

  • AUTOCOMMIT is on by default, but it could be off. Should dbt demand to work with AUTOCOMMIT on? Should we add this as an option in profiles.yml?
  • Since v0.17.2, dbt closes connections on Snowflake once it's done using them. This has the effect of ending the session, i.e. implicitly rolling back all transactions if AUTOCOMMIT is off.
  • The delete+insert incremental strategy requires running two DML statements within the same transaction. We'll need explicit begin + end logic there.

Another trade-off here: auto_begin (for statements) and inside_transaction (for pre- and post-hooks) would no longer work. But it's not clear to me that they're working well today, either. We could advise users to explicitly specify begin + commit within their statements or hook definitions, rather than relying on dbt's built-in methods to open up or close out transactions on their behalf.

Describe alternatives you've considered

  • Leaving all the unnecessary begin + commit in place, to the tune of much unnecessary credit expenditure
  • Reconsidering dbt's default behavior around transactions, as the majority of dbt projects increasingly work with non-transactional databases

Who will this benefit?

dbt + Snowflake users

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jun 29, 2021

We had a chance to chat with some folks from Snowflake, who provided the following guidance:

  • DDL statements require Cloud Services, since they remove/add/replace metadata entries. When autocommit is enabled, Snowflake introspects DDL statements before running them to minimize the Cloud Services credits consumed at runtime.
  • Snowflake treats explicit transactional logic as optimization fences for read-ahead introspection. A DDL statement sandwiched by begin; and commit; is not introspected prior to running, so at runtime its Cloud Services usage is not minimized.
  • DML statements do not consume Cloud Services credits, and they have peskier interactions with autocommit.

Given the above, Snowflake strongly recommends three clear best practices:

  • Always turn on autocommit
  • Never wrap DDL in explicit transactions
  • Always wrap DML in explicit transactions

@jtcohen6
Copy link
Contributor Author

Resolved by #3510

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request snowflake
Projects
None yet
Development

No branches or pull requests

1 participant