Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagging is too distinct from model building #38

Open
BeadW opened this issue Jun 19, 2023 · 2 comments
Open

Tagging is too distinct from model building #38

BeadW opened this issue Jun 19, 2023 · 2 comments

Comments

@BeadW
Copy link

BeadW commented Jun 19, 2023

The current implementation of snowflake tagging runs as a post run hook. (https://github.com/Montreal-Analytics/dbt-snowflake-utils#snowflake_utilsapply_meta_as_tags-source)

This works however there are some ways we can improve it.

Limitations (not exhaustive) of the current approach:

  1. Unable to fail fast - A typical failure mode is that the yaml doesn't accurately match the model output causing tagging to throw a snowflake error. This error isn't known until all models have run causing a delay and leaving models in an undefined state.
  2. Failures aren't able to be rolled back - Because the tagging isn't part of the same transaction we can't roll back when a failure occurs.
  3. Process is slow - A post run hook is single threaded, given we are applying tags to models which are mutually exclusive by their nature a multi threaded approach would be more performant.

I propose that we move to a post hook which runs as part of the model transaction.
This means that tagging should occur as part of the model build transaction and dbt will automatically multi thread it according to the threading settings of the project.

I have a proof of concept to demonstrate this works and am happy to provide that and work toward a more feature complete version if this approach is to be adopted.

@jamesweakley
Copy link
Contributor

Hi @BeadW ,
There is definitely already some interest in shifting it to the model level. If you have an alternate macro we could add into the mix, it would definitely be welcome. We would probably keep it as two different macros so that people can choose which method to use.

@kanomaxb
Copy link

kanomaxb commented Nov 8, 2023

I would add another reason:

  1. For tag-based masking policy the time between creating a table and masking its data needs to be minimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants