dbt: move to materialize-boilerplate, only trigger after delay #2082

mdibaiee · 2024-10-24T09:36:44Z

Description:

Move dbt job trigger logic to materialize-boilerplate where it can be better integrated with sync frequency logic, and makes it easier for materialization connectors to integrate the functionality in general

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)

This change is

mdibaiee · 2024-10-24T09:38:39Z

materialize-boilerplate/transactions_stream.go

+ // We trigger dbt job only if the delay is taking effect, to avoid bursting
+ // dbt job triggers during backfills


One question worth posing here: have cases where a materialization has so much data that it always skips the default delay? In those cases it might seem like dbt job trigger is not working at all, but it is being always postponed... 🤔

There are definitely active tasks where a materialization has some much continuous data that when combined with the backpressure of the destination system, the transactions are always greater than 1MM documents and the delay would always be skipped.

Fortunately these are also cases where the tasks are configured with an explicit delay of 0s, so it doesn't really matter. I doubt this will always be the case though. Our current delay triggering mechanism based on the number of documents in a transaction is fundamentally flawed but it is the best we can do right now.

williamhbaker

Here's a scenario to consider: A user may have a batch capture that runs every 24 hours that produces some large number of documents only once a day but never at any other time. They are materializing from that capture, so the materialization only does any work every 24 hours. When a materialization first starts up, it runs a fixed number of commits (5) without applying any delay, to avoid accidentally tripping the delay due to artificially small commits from the runtime that sometimes happen when a connector restarts. If the materialization connector restarts in that ~24 hour period of idleness, it will have 5 instant commits queued up for when it starts seeing data. If it is able to work through the batch capture's 24 hours worth of data that it captures all at once in 5 or less transactions it won't ever trigger the update delay, and thus would never trigger the dbt job. It might eventually trigger the dbt job during the next 24 hour cycle, or it might not even then if the materialization is restarted again in the idle period.

williamhbaker · 2024-10-24T14:16:30Z

materialize-boilerplate/transactions_stream.go

+ // We trigger dbt job only if the delay is taking effect, to avoid bursting
+ // dbt job triggers during backfills


There are definitely active tasks where a materialization has some much continuous data that when combined with the backpressure of the destination system, the transactions are always greater than 1MM documents and the delay would always be skipped.

Fortunately these are also cases where the tasks are configured with an explicit delay of 0s, so it doesn't really matter. I doubt this will always be the case though. Our current delay triggering mechanism based on the number of documents in a transaction is fundamentally flawed but it is the best we can do right now.

mdibaiee requested a review from williamhbaker October 24, 2024 09:36

mdibaiee added the change:planned This is a planned change label Oct 24, 2024

mdibaiee linked an issue Oct 24, 2024 that may be closed by this pull request

dbt cloud trigger: avoid bursts of triggers when backfilling materializations #2081

Open

mdibaiee commented Oct 24, 2024

View reviewed changes

dbt: move to materialize-boilerplate, only trigger after delay

8f4873a

mdibaiee force-pushed the mahdi/dbt-binding branch from 1bbb978 to 8f4873a Compare October 24, 2024 09:58

williamhbaker reviewed Oct 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbt: move to materialize-boilerplate, only trigger after delay #2082

dbt: move to materialize-boilerplate, only trigger after delay #2082

mdibaiee commented Oct 24, 2024 •

edited by jgraettinger

Loading

mdibaiee Oct 24, 2024

williamhbaker Oct 24, 2024

williamhbaker left a comment

williamhbaker Oct 24, 2024

		// We trigger dbt job only if the delay is taking effect, to avoid bursting
		// dbt job triggers during backfills

dbt: move to materialize-boilerplate, only trigger after delay #2082

Are you sure you want to change the base?

dbt: move to materialize-boilerplate, only trigger after delay #2082

Conversation

mdibaiee commented Oct 24, 2024 • edited by jgraettinger Loading

mdibaiee Oct 24, 2024

Choose a reason for hiding this comment

williamhbaker Oct 24, 2024

Choose a reason for hiding this comment

williamhbaker left a comment

Choose a reason for hiding this comment

williamhbaker Oct 24, 2024

Choose a reason for hiding this comment

mdibaiee commented Oct 24, 2024 •

edited by jgraettinger

Loading