-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support constraints independently from enforcing a full model contract #10195
Comments
Hey! Looks like this is similar to this other ticket when we closed as not planned.
The way contracts works today is we apply the constraints and schema (data types / column names) during the It sounds like what you're asking for is to apply the constraint as an Most databases don't actually enforce constraints beyond not null - see our docs here. Which warehouse are you using? For snowflake, for example, even if you were to run an I would recommend using our |
Thanks for the reply. From my understanding they are different in that the referenced ticket is still looking to fully contract a model (100% of fields defined) but allowing a way to do a higher level contract definition that can then be inherited and extended at the model level. I'm looking to still define just at the model level, but not have to define all fields if they are not important for the constraints required. I could see applying this in the create table up front, and you'd create the table with the provided columns and then would have to allow for schema evolution to add any additional columns that the model defines but are not part of the partial contract. It could be done via alter after the fact as well, but I know that some databases won't let you apply PK constraints outside of table creation, so I would lean towards the partial definition within the create statement. I am using Databricks, which is not really applying a majority of the constraints. However, I do want to be able to define the PK/FK relationships and ideally the ones that are enforced such as not null. I am using dbt pretty early in the data lifecycle where I want to allow my tables to evolve as new source columns are added, but that ideally wouldn't prevent me from defining the known constraints up front. Beyond the testing of the data, my end goal is to support data users that will benefit from being able to generate table relationships directly from the table metadata vs maintaining external ERD documents. As well as Databricks providing the ability to generate ERD documents directly from the table relationships where they are defined. This is what leads me to suggest that constraints and contracts while likely related are really two separate concepts. |
@kmarq Based on reading your feedback, I just updated the issue title to be "[Feature] Support constraints independently from enforcing a full model contract". Just let us know if that title doesn't seem accurate and we can try re-titling it. |
That's good by me. Thank you |
Thanks for the follow up.
I don't believe you can have a partial definition of your table within the create statement. Looking at the databricks docs:
If you want to define a table specification, you can't only define column constraints nor can you define a partial list of columns. So I'm not sure that what you're suggesting would be possible. Since constrains and "contracts" are both required during the What I believe you can do is apply non-enforced constraints via an Is this feature request just to have dbt run that
|
Here's another related issue on using constrains for ERD diagrams like the use case you mentioned - #3295 |
When I say partial, I mean that all columns requiring constraints would be provided up front so they can be done in the Create Table statement. Additional columns could then be added to the model via the normal schema evolution process. So I could define something like:
My model might then have
Where extra_col would be added as a new nullable column during execution.
Applying the constraints as an after creation alter would be acceptable to my use case. However, I really think that while contracts and constraints do seem highly related due to being needed at table creation time, that they are two separate concepts and ideally should be split. Contracts are providing much more additional functionality beyond constraints, but constraints on their own outside of contracts can provide value as well. |
Recreated some previous testing of constraints. If you define a primary_key constraint on a column
The constraint shows up in the model.columns
whether the contract enforcement is true or false. If you define a composite primary_key on the model
Then in the model.constraints you will see this ONLY if contract enforcement is true:
compared to false:
So currently the ability to apply constraints after the fact outside of contracts is highly limited |
Thanks for all of these examples, I really appreciate it. So for the example you gave where you have a model with a constraint defined:
and then that model has additional columns in the SQL:
I'm not sure how "Additional columns could then be added to the model via the normal schema evolution process" would be possible today. In this case, dbt would not be aware of Additionally, you'd need to know the data types of the columns for the |
Yeah, I see. I was definitely thinking Databricks centric and that we could do a merge with schema evolution enabled vs a straight insert. That would handle automatically adding extra_col. We do that in our non-dbt processes today. The table gets created with any audit required columns and partition columns the user defines then we do a merge to insert records so the schema updates. I could definitely do something on my side even if its overriding the table materialization to do something like that. I think the bigger issue is the ability to have that post table creation alter applied to put the non-enforced constraints in place. With the above issue, it is not really possible to implement in something like a post_hook. So maybe this is really a smaller request to have the constraints come through on the config even if the contract isn't enforced? Then adapters could implement applying them yet if it made sense on the platform, or it could be done via macro in a hook if necessary. |
Ah! That makes sense about the Databricks use case, thanks for explaining that to me. I think we'd need to think more about a solution that could work for all warehouses to add this "partial contract" support to dbt-core. For other warehouses we may need to do something similar to how we get datatypes/column names for unit tests - creating a temp object with the "empty" query (
I created a new issue for this piece -> #10219 |
The new one looks good, thank you. Happy to help with any other required discussion around potential options to implement this one.
I just tried a few examples using this and from a query perspective it is very fast. It is definitely more total steps and that adds up. From my perspective, the users that want to enable something like this partial contract are likely more concerned about getting that additional capability enabled vs the additional processing time. This wouldn't impact every model, only ones that are defined with that, so its at least a conscious tradeoff. Thanks for the discussion. |
+1 |
+1 for BigQuery, which uses PK/FK constraints to optimize queries even though it does not enforce them. I'd love to get those performance benefits on models where I am not concerned about their contracts/schemas. |
Is this your first time submitting a feature request?
Describe the feature
I would like to have the ability to use many of the features of a contract without being able to fully define a rigid contract. This should allow the ability to apply constraints like Primary and Foreign keys, not null, and other constraints on the columns that are provided but then allow for schema evolution to add new columns that are not governed by the contract.
Describe alternatives you've considered
Using post-hooks to alter tables and apply the constraints directly.
Moving constraints to be unrelated to contracts and making the contract flag related to enforcement and privacy rather than controlling application of table constraints
Who will this benefit?
Developers using dbt especially in earlier stages of data transformation that want to use constraints to define relationships and key columns but leave their model open to schema evolution for additional columns to be loaded
Are you interested in contributing this feature?
Possibly but may not be a great first one
Anything else?
Breaking this off from the discussion here: Infer Schema contract discussion
The text was updated successfully, but these errors were encountered: