-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-426] [Feature] Manage full schema content with dbt by dropping tables/views not maintained in models #4957
Comments
(Similar proposal from long ago, linking for historical context: #1135. I don't know if I still agree with the proposal outlined in my last comment there, though.) |
Thanks for the well thought out feature request @bneijt! I think this is a great idea and I know we'd love to have something like this in core. We have it as a backlog ticket now, but it's probably not in the immediate future plans for the core team. That said, if it's something you and your colleague would like to contribute that would be really fantastic-- I'm sure we can help provide some assistance along the way as well. |
(by the way, be sure to check out #1135 there's lots of good ideas and prior art in there) |
Hi, @bneijt's colleague here. Good to read that you are enthusiastic about the proposal. We think that implementing this feature consists of 2 tasks:
We looked into #1135. It mainly talks about diff / compare functionality. We think that this is not needed for the MVP, and can be added in a later release. We are curious to hear whether you agree with our ideas! |
After some consideration, we have come to the following:
By making it top-level config, we ensure we don't mix management and the cascading nature of the project config. This also allows us to drop the check on defining both schema and manage action at the same level. for the
Default for manage_schemas is false, configuration is done at a target level. |
@bneijt Thanks for sharing the concrete proposal! Two feelings I wanted to share: First: dbt should support a Second: Specifying schemas to manage in As a dbt user, I'd be more inclined to pass a runtime config, Operationally, this could be done at the end of the run, akin to an dbt-core/core/dbt/adapters/cache.py Lines 453 to 459 in 14c2bd9
|
First: yes a --dry-run is on our todo. First iteration we want to have a configurable set of possible Second: Our first intention was to have it as part of the node configuration, next to the model models:
target:
+schema: schema_a
a:
b:
+schema: schema_b
c:
+schema: schema_b
+manage_schema: true Now all models below b would be managed, even though this is only specified at the end of the configuration. Bottom line, enabling management on a schema is not a cascading configuration property when it comes to models. The only proper solution seems to be a top-level configuration in the project. If we would only want to allow this configuration to be done via the command line, I think the definitions of which schema's should be managed will get out of hand (we want to manage about 10 schema's in my current project). We are currently experimenting with the code in a feature branch on our fork https://github.com/BigDataRepublic/dbt-core/blob/c22aaafff0f25d10d6769850ff50413c02063937/core/dbt/task/run.py#L488 with a pull request towards the for to allow us to track discussions and progress BigDataRepublic#1. As for the caching behavior, that is something we still need to look in to, might indeed be nice to pair that as you mention. |
@jtcohen6 I don’t think that not specifying schemas to manage, and simply manage all schemas that have models, works. When removing the last model(s) of a schema from your project, these models are left in the database, while they should be removed. Do I miss something here? |
@bneijt @agoblet Thanks so much for opening the PR! This really helped me get a feel for what you're thinking. I'm going to leave some comments over there. I like the idea of
Hmm, good point. A few wild ideas that occur to me:
|
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
The issue is still very active over in its PR! #5392 |
Doen a lot of work in a pr and had some feedback from the team. Meanwhile I moved on from using dbt and the original pr would require a complete rewrite to fit on the current main branch. Closing this issue due to inactivity. |
Is there an existing feature request for this?
Describe the Feature
We sometime deprecate and remove old models and implement renames or new models. To stay in control of the data we maintain in the database, we would like to be able to have dbt automatically remove tables that are not linked to models defined in dbt.
This will make sure that in your test/prod environments you do not end up with stale, possibly sensitive, data still in the database that is no longer maintained.
General idea is that you are allowed to write
dbt_project.yml
configuration that states:which would drop tables/views not used in the
something
schema. (future alternatives might include "rename" as a maintenance approach or move to another schema).Because I want the feature to work on some but not all databases (we share a dev database in our project) I would like to have another configuration required in the
profile.yml
which allows this to start working on the specific database:Describe alternatives you've considered
Who will this benefit?
Are you interested in contributing this feature?
Yes, me and a collegue of mine would like to write code for this
Anything else?
Might also help with #3685
The text was updated successfully, but these errors were encountered: