-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add include_data_types
argument to generate_model_yaml
macro
#122
Conversation
Set to true as default; update tests
include_data_types
argument to generate_model_yaml
macro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@linbug Thank you for picking this up! This will be a great addition for folks who start using model contracts in v1.5
macros/generate_model_yaml.sql
Outdated
{% if parent_column_name %} | ||
{% set column_name = parent_column_name ~ "." ~ column.name %} | ||
{% else %} | ||
{% set column_name = column.name %} | ||
{% endif %} | ||
|
||
{% do model_yaml.append(' - name: ' ~ column_name | lower ) %} | ||
{% if include_data_types %} | ||
{% do model_yaml.append(' data_type: ' ~ column.data_type | lower) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the adapter / data platform, we may want this to be column.dtype
instead of column.data_type
— I believe that will enable this to return text
or varchar
instead of character varying(1234)
.
We could use the dbt.format_column
macro, which is also what's used when doing the column type assertion for contracted models.
{% set formatted = adapter.dispatch('format_column', 'dbt')(column) %}
{% do model_yaml.append(' data_type: ' ~ formatted['data_type'] | lower) %}
If we do make this change, the format_column
macro is new in v1.5, so this would require a version bump to codegen
and to its require-dbt-version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 good point about column.dtype
vs. column.data_type
!
Using dbt.format_column
macro
I like your idea of using the logic within the dbt.format_column macro. I think we should aim to preserve the wide range of require-dbt-version: [">=1.0.0", "<2.0.0"]
though.
There's multiple approaches we could take to use this logic into codegen
without forcing an upgrade to 1.5.0. The quick'n'dirty option is to "vendor" it by just copy-pasting it into codegen. Another option is to inspect the dbt version, and fall back to a vendored version of the macro if it is less than 1.5.0.
Difference between dtype
and data_type
My understanding of the difference between the two is dtype
gives the database-specific data type with no size, scale, or precision (like varchar
or decimal
) whereas data_type
is intended to give the database-specific data type with those included (like varchar(80)
or decimal(18, 2)
).
Although dbt model contracts can operate with either input, it will effectively ignore any size/precision/scale that is supplied.
So best would be to exclude size/precision/scale to avoid any false impressions that it will be verified as part of the dbt model contract.
💡 We should fix
column.data_type
so its value is always valid. It feels very doable to make it work across adapters, and relevant issues opened here, here, and here.
💡 Once
column.data_type
is fixed, we should consider expandingdbt.format_column()
to include it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so I added this which I feel should work, but for some reason my tests fail locally because formatted
is empty. My dbt version is 1.5.0-b5
which is maybe pre-release and doesn't have dbt.format_columns
yet? If you have suggestions for getting around this please let me know, otherwise I can use the vendored version for everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbeatty10 @jtcohen6 any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dave-connors-3 @benmosher we may want to get this across the line now that generate model is live in the IDE. would be a good improvement to the functionality.
macros/generate_source.sql
Outdated
@@ -62,7 +62,7 @@ | |||
{% for column in columns %} | |||
{% do sources_yaml.append(' - name: ' ~ column.name | lower ) %} | |||
{% if include_data_types %} | |||
{% do sources_yaml.append(' data_type: ' ~ (column.data_type | upper ) ) %} | |||
{% do sources_yaml.append(' data_type: ' ~ (column.data_type | lower ) ) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, we may want to use .dtype
instead of .data_type
, depending on the data platform — and the format_column
macro (new in v1.5) could be our friend here.
Are you looking for additional contributors on this to help keep it moving? Happy to help however I can but obviously don't want to inject myself where I shouldn't. |
@linbug -- this looks great -- if you are able to resolve the conflicts, we can do a final review and release on this! |
Jumping in as someone who has been watching the PR for over a month now - I'm so excited that this is finally going to be merged in! This is a feature I've been waiting for before our team implements contracts, thank you all so much for working on this! |
@linbug do you need any help getting this PR over the finish line? If you don't have time I would love to help! |
I'm going to take a shot at resolving the merge conflicts and then do final review. |
@dbeatty10 you are a champion of life |
Let me know if you need help. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for contributing this @linbug!
My apologies to you and everyone else that have been waiting for this to be merged 🙏
I pushed two main changes:
- b2f463c - rely only on the "vendored" version of
format_column
(rather than usingdbt.default__format_column
)- comparing dbt versions robustly is more trouble than it was worth in this case, so this change greatly simplified things
- 6f6aea6 - allow end users to configure their own data type formatting by using multiple dispatch to override the
data_type_format_source
and/ordata_type_format_model
macros.
Apologies for missing the pings about resolving conflicts. Thank you for merging this! |
No prob @linbug -- really appreciate the great work you did here! 🏆 |
resolves dbt-labs/dbt-core#120
Also:
include_data_types
value forgenerate_source
to Truegenerate_source
andgenerate_model_yaml
This is a:
All pull requests from community contributors should target the
main
branch (default).Description & motivation
Checklist