-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strict field validation for schema.yml #1570
Comments
Hey @jwerderits - thanks for making this issue! How do you think dbt should handle this in practice? Do you think a |
@drewbanin I think it makes the most sense for a |
This would be really useful! Would it be possible to extend the validation to check for the opposite scenario, columns which exist in the model but don't have entry in the schema.yml file? |
@JackArthurton yeah! I think that's a great idea. I'm imagining that this would be opt-in, so you could annotate a schema.yml specification with |
Drew pointed me to this issue when I asked about this on Slack. My question was:
Love the |
Any progress regarding field validation? I'd like to be able to define the schema like this:
|
hey @smomni - we haven't prioritized this one yet! If you're in a pinch, you can actually define your own custom schema test in your dbt project: https://docs.getdbt.com/docs/custom-schema-tests If you create a macro in your
Then you'll be able to assert that columns exist with:
I think the version of this that we add to dbt natively will be a little bit smarter than this. We can just run a single query to find all of the columns in a table, then check them against the columns in the schema file. The schema test i shared above will run one query per column which probably isn't optimal |
Hey guys this looks like it could be a really useful idea. For some context - we want to create a self service ELT pipeline whereby people can create their own datasets from upstream sources. In order to mitigate the risk of undocumented datasets entering the warehouse it would be nice to enforce people to document the newly created tables and views with descriptions. The ideal use would be a project level flag that turns on the necessity of schemas to have descriptions. This would then be used as part of CI/CD to stop undocumented schemas getting to production. |
this approach doesn't fail if the column is not present in the table |
@ArafathC are you sure about that? The query should fail because the specified column does not exist in the table. I'm curious if you could elaborate about what you mean |
@drewbanin Sorry about that.. Was able to double check and confirm it works |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Hey @drewbanin any activity here? I would love to make use of such a feature. Internally, I wrote an app that does all this kind of checking on our existing reporting warehouse, which doesn't use dbt. I'm migrating it over to use dbt, and this is definitely a gap we're seeing. Thanks in advance for any updates / help |
Also, sidenote, I do think there's an error in the macro posted above. It needs to be |
Just found this: https://github.com/calogica/dbt-expectations/tree/0.5.1/#table-shape it's pretty awesome. Gets to most of what I need here. 👍 |
This is what I've been using to check if all the columns defined exist.
|
I think this feature is very useful as well. It would be very nice if I can enforce the followings.
Are there plans to prioritize this one @drewbanin ? Seems like dbt exception does not cover this use case. |
Thank you for the help! One quick thing: I think you will have an error with trailing commas here (for the last column). I believe it should be updated to (edit on line 17):
|
Ahh yep I'm predominantly using BigQuery these days where trailing commas in the select statement are allowed. |
Agree with all of @yummydum's suggestions, with the aim that the schema be complete (all columns present and described) and accurate (at least in that it doesn't document columns that do not exist). It looks like @benhinchley's macro ensures the latter part of this (documented columns must exist) but there doesn't seem to be a solution for the other requirements in here. Is it worth reopening the issue (seems to have been closed by a bot, not closed as fixed or will not fix)? |
@benhinchley, @schylarbrock, thank you for the solution with a test code. Very handy! I've adjusted it to be a singular one (not a generic one). So with this version, we don't need to specify a test for every model in schema.yml.
|
So, to summarise there can be four possible cases:
Using the test mentioned by @Klimmy, Now, the interesting part - d. DBT throws a warning during the first compilation/ run for case 4. This way, I am able to cover 3 out of 4 cases. Please correct if there is any concern with my approach. A doubt: What I didn't understand. In the snippet by @Klimmy , why do we need
Why can't we just do |
Just coming to DBT and hitting this issue. It doesn’t sound too hard to resolve. Why has it not been added to DBT core yet? |
You can also use the contract enforced property to have the model fail at runtime! https://docs.getdbt.com/reference/resource-configs/contract schema.yml snippet:
|
I've tried encorced
test doesn't fail |
ok problem solved
|
Issue: field validation for schema.yml
Issue description
Invalid/nonexistent field names can have descriptions in
schema.yml
that are populated in the documentation.If a field is deleted, but was previously documented (correctly), the docs will indicate that that field still exists.
Fix
Apply column validation for fields that don't have tests applied to them
Steps to reproduce
You can create any column name and add a description in schema.yml and it will be populated in the documentation
The text was updated successfully, but these errors were encountered: