-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAP-508] [Feature] Honour cluster_by
config for Python models
#585
Comments
Thanks for noticing this and reaching out @carlescere ! Adding the relevant Could you share more details on the |
I am not an expert on snowflake and I don't think there's many public documentation about the auto clustering internals so, if any snowflake employee or someone more knowledgeable wants to confirm or deny the explanation please do. My understanding is that micropartitions are created on data arrival (read ordering) and are immutable. In the background, the This reason is based in conversations in the snowflake community forums like the one you shared and limited experimentation on my side so, sadly, I don't have hard proof on this. Ultimately my point was based on the fact that the SQL counterpart does, in fact, |
Thanks for that info @carlescere 👍 I'm going to re-label this as a feature since we treated this type of configuration as out-of-scope during our initial implementation of dbt python models for Snowflake. Acceptance criteria
OptionalDepending on if it makes sense to include or not: Out of scope
Considerations during implementationWhether to include The current approach of table creation for Python models is to use the
The former is likely vastly easier to implement. The latter may be more efficient for large tables. |
cluster_by
config for Python models
Thanks @dbeatty10! 🎉 |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Any chance this is happening anytime soon? Btw, would this be implemented somewhere in the |
Is this a new bug in dbt-snowflake?
Current Behavior
When creating an incremental model with
dbt.config(cluster_by=['key'])
thedbt run
will create (the initial run) aCREATE OR REPLACE TABLE ... AS ...
query. This query is not followed by anALTER TABLE ... CLUSTER BY ...
query as it does in its SQL counterpart. Additionally, the SQL model will attachORDER BY <unique key>
on the creation of the table to make the clusterisaltion of data faster; this does not appear in the Python table creation.Expected Behavior
ALTER TABLE ... CLUSTER BY ...
query after table creation if there is config for cluster by.ORDER BY
the unique key on the cluster by key.Steps To Reproduce
dbt run --select python_test_model
.ORDER BY
clause.ALTER TABLE ... CLUSTER BY ...
query.Relevant log output
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: