-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1576] [CT-1572] [Bug] Snowflake cluster_by config does not update Snowflake table for incremental models #335
Comments
@danielmast Thanks for opening! I am going to move this one over to Currently, the
dbt-snowflake/dbt/include/snowflake/macros/adapters.sql Lines 26 to 30 in 626d32e
dbt-snowflake/dbt/include/snowflake/macros/adapters.sql Lines 34 to 39 in 626d32e
Clustering a Snowflake table is more than just a metadata update — it requires actually changing the way the data is being stored on disk. That's why we include the My understanding is, adding a new clustering key during an incremental run would have the implicit effect of reshuffling all the preexisting data in the table — with significant costs in compute & time — a change that's better suited to an explicit I do see how it's confusing that the key is implicitly not added during the incremental run, though. We could look to detect the table's current clustering, and raise a warning if it differs from the one set in your config. What do you think? |
@jtcohen6 Thank you for your reply. I fully agree with your argumentation, that reclustering is quite intensive, and that users should in some way be made aware of that. I think the most intuitive strategy would be: when someone changes the clustering config on an "incremental" materialization: raise a warning but also perform the change.
(I'm not a native English speaker, so this text can probably be much better.) I think it's most intuitive that a change of the cluster_by config does actually perform the change. I think it can be expected of someone using Snowflake's clustering functionality, that he/she is aware of the impact. Thinking that the clustering change has been applied where it hasn't may have worse outcomes, or frustrated users. I'm reading in the docs that:
I assume that the staged dataset will be ordered by the newly configured cluster keys. So that order won't match with the actual clustering config of the table. More reason to perform the change, imo. What do you think of this proposal? |
I realize that this change should not only affect clustering behavior in Snowflake, but also in BigQuery, as they both use the CLUSTER BY functionality. Do you agree? |
@jtcohen6 I'm going to steal this one from you. @danielmast I need to do some thinking here. There is a larger topic emerging here, tied to materialized views (dynamic tables for Snowflake) and I will get back to you next week. |
Linking to dbt-labs/dbt-core#6911 |
Thank you @Fleid . As you mentioned that this issue will be part of a larger topic, I'll await that discussion for now. Please let me know if I can help with implementation or something else. |
For MVs specifically, I just added a new parameter to the design called On
Not sure how this would interact with But more importantly, I still have a big question mark about the feasibility of certain options. Let's say we add something similar to incremental models and we want to cover cluster_by keys via Curious about your thoughts on all that. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Is this a new bug in dbt-core?
Current Behavior
Updating the cluster_by config on an incremental DBT model does not trigger an ALTER TABLE query on the table. Effect: the cluster keys of the Snowflake table are not updated.
Expected Behavior
Update the cluster_by config triggers an ALTER TABLE ... CLUSTER BY query in Snowflake, updating the cluster keys.
Steps To Reproduce
['key1', 'key2', 'key3']
(For a regular (non-incremental) table model this problem doesn't occur, because the entire table is recreated everytime.)
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
snowflake
Additional Context
The text was updated successfully, but these errors were encountered: