-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hide model from docs #1671
Comments
@drewbanin I realize I never directly responded to the following:
I think if we were to treat this sort of like github's CODEOWNERS in which developers could provide declarative (pattern based) show/hide rules on various aspects of docs then this makes sense. That seems like the most extensible design. In the long run I like this. In the short run if it is easier to implement this in the model yaml as a per-model option, we should start there and at some point in the future we could figure out how to do a an access-control config file and make it so that you can mutually exclusively use either the access control config or the per-model options, so that there are no precedence rules to worry about. |
Thanks for the great writeup @mike-weinberg! In general, I love the idea of being able to configure attributes of the docs site from inside of a dbt project. This is only tangentially related, but we have an issue for color coding data sources, which could probably be configurable via the same mechanism as the I'm picturing a version: 2
models:
- name: my_model
docs:
show: false
- name: other_model
docs:
color: "#def456"
sources:
- name: public
docs:
color: "#abc123"
tables:
- name: numbers
docs:
show: false dbt would just be responsible for picking up these configs and dropping them into a I think that this is a passable implementation of the feature you're describing, but I think it also leaves some things on the table. I think there also might be merit to adding a similar config to # dbt_project.yml
models:
my_project:
intermediate:
schema: "intermediate"
docs:
show: false
some_package:
docs: false I think this will benefit from #1503, in which you'll be able to reference the
This would render the complete docs in development mode, but hide some models in production deployments of the documentation. I think a good place to start here would be to:
Support the following fields in a
A compiled node in the dbt manifest currently looks like this (lots of fields removed for brevity):
I think if we can include the Some constraints that may prove helpful:
Let me know what you think about all of this! |
@drewbanin I'm completely on board with this proposed interface and roadmap for this feature. regarding
This might seem like a hot take but in the event that a doc-show settings on a given model are specified in both the |
One weird use case we have sometimes is that we need to temporarily disable a model from building (b/c of a failing test or bad data or something) but we still want the docs to show for that model. Do you think this could be abstracted enough so that you could separate build from docs at the dbt_project.yml level? Something like:
but |
Hey @tayloramurphy! That makes sense. That being said, in the interest of managing scope for this issue, lmk if the following alternative might better solve that need: Would it be better if instead you had the ability to specify a build option in What do you think? |
Our preference would be to have the change in source control and not as a run-time configuration. Every change we make goes through a PR review process so there would be follow-up issues to rectify and we'd now be able to point other people into the company to where and how the changes were made. Totally get wanting to manage scope on the issue. My only aim was to see if there was a higher level of abstraction possible around this since someone will be touching that code anyways. |
oh I see, so if you do a backfill, you'd want the fact that a backfill was run to show up in source control, is that right? Out of curiosity, how do you guarantee that the backfill only runs the one time, or do you manually manage that change? We've been talking about something like this internally too. ORMs support A DBT native migration concept for managing backfills in a source-controllable way definitely would be awesome, in my opinion! (but also maybe out of scope for a show/noshow docs config MVP?) |
@tayloramurphy when you temporarily disable a model like this, what do you do about the models that depend on that model? I believe dbt should show you a compilation error in this scenario. Do you disable all of the downstream models as well? Very happy to discuss this further, but think I'd prefer to do so in a separate issue! |
@tayloramurphy just to follow up, my broad thinking is that this should be implemented with some sort of no-op materialization? I think that strikes the right balance between:
This will of course fail if the destination table doesn't exist and downstream consumers depend on the model, but it sounds to me like that's a fair and reasonable tradeoff here. Definitely feel free to create a new issue if you'd like to discuss further! |
I'm realizing belatedly that a nice to have would be the option to distinguish between hiding models from the index/data-dictionary vs hiding them from the lineage viz. The data dictionary is kinda for everyone, while the lineage is for the people who may be trying to debug something and need a high level picture to get started. Since the default behavior is to show things, and we want to make the config as readable as possible, it might be better if we have a This might look like: ...
docs:
hide:
- data-dictionary # completely hides from the index page, and there would be no data dictionary pages generated for the hidden models.
This is what I would expect to see most of the time - sources and intermediates might be hidden from docs in most project pages. if, say, This also means someone could hide a model from the lineage graph but still document the model, which seems like a rare but plausible requirement. |
@mike-weinberg I gave this broader concept some thought recently, and here's where I ended up: a single list of show/hide configs is probably too simplistic for the use case that you're describing. What we really need is a notion of "layers" (better word to come, hopefully) which control the view into the specified dataset. This would be less about showing/hiding specific models, and more about describing "Here's the appropriate view of the docs for user persona X/Y/Z". That might include showing/hiding different nodes, rendering the file tree or database view (or both), and, in the future, showing things like ERDs for only the subset of "output" models meant to be consumed directly. I think that's broader than the scope of the implementation in #2107, but it's definitely something we'd be interested in supporting more completely in the future! |
add a "docs" field to models (#1671)
the second half of this will be implemented in dbt-labs/dbt-docs#68 |
@drewbanin I think that with #1671 (comment) you are correctly interpreting my intent. complex scenarios may arise, and non-trivial configuration options may be required. I'm definitely supportive of the direction you're leaning. I do wonder if the idea of personas starts to couple DBT core with the paid product, since the configuration you suggest might only be relevant if the docs server supports authentication and contains a global list of users and their roles. If that's the case, it could be worth considering implementing this twice, once in a simple way, and a second time for a future independent |
fixed in #2179 |
does this work with sources? can't seem to make it work |
To add another entry on this issue that has been closed for 3 years, I would also benefit from hiding sources from the docs. It would be helpful to our average users as they simply confuse sources and staging models at this point. |
Describe the feature
for a project with intermediate steps like
B
inA -> B -> C
it is sometimes the case that the intermediates are not intended for general consumption, but they will show up in DBT generated docs anyway.For large products with many intermediates, it may be desirable to document the source tables and any final tables used for reporting in production, but it may be preferable for the intermediates (which could be tables, ephemeral CTEs, or temp tables) to be treated as a black box and kept hidden from the generated user-facing documents.
From an implementation perspective, it seems like a good place to put this would be in the
<model>.yaml
files, to both maximize extensibility and encourage good DBT hygiene (make a YAML for every model,even ifespecially if it isn't in the final docs!)Describe alternatives you've considered
Martin Guindon read our minds when he suggested the problem and a viable workaround
(from DBT slack https://getdbt.slack.com/archives/C0VLZM3U2/p1565030505180400)
Martin Guindon
Mike Weinberg (WeWork)
Mike Weinberg (WeWork)
As suggested here, DBT projects form a powerful abstraction, and splitting a project in two as a means to achieve the above is a viable solution, however it increases the number of projects to maintain and creates the appearance that one pipeline is really two. Ideally if a sub-dag is not reusable, we would opt to keep it inside a single project.
Additional context
Additional considerations include:
if
B
has an option likehide-model-from-docs: true
then should the lineage graph showA-> B -> C
or skip B entirely and showA -> C
? Different users may have different opinions, but "whatever is easiest" probably makes sense for an mvp.if
B
is marked as hidden as described above, what should the SQL forC
show? Again, this is likely to be controversial but anything is better than nothing, and as long as we don't close the door on the various options, it should be fine to choose whatever is easiest. The options, as we see them, are:B
(or a sequence B1, B2, B3, etc) partially inlined into C via a WITH blockWho will this benefit?
This benefits organizations with large projects in which there are many intermediate steps. Intermediates often exist for the purpose of accelerating builds by reducing redundant work, but the intermediates may themselves not be valuable for reporting, or may hold PII, etcetera. As is, DBT shows these models in the generated documentation for a project, and when there are a large number of intermediates, it can be difficult for end users to navigate the documentation for the models that are actually accessible and relevant to them.
This change primarily benefits the decision makers and analysts who leverage documentation as a data dictionary to understand the meaning of relations and columns, and how they relate to each other. It does this by limiting docs to only those models which are relevant to end users.
Slack Conversation
drew.banin [8 hours ago]
got it - thanks for the context! I can imagine this working in a couple of different ways:
dbt docs generate
One question: if your graph looks like:
A ---> B ----> C
and you’ve marked B as “hidden”, would you expect to see
A ---> C
in the docs DAG view?
Mike Weinberg (WeWork) [3 hours ago]
I think we are indifferent about the display of intermediates in the data lineage. End users definitely don't need (or want) to see tables they will never use, and developers don't need to see the graph because they wrote the damn dag, to borrow a phrase from a senator from vermont.
Mike Weinberg (WeWork) [3 hours ago]
as a result, the choice of if and how to show the sql is dependent on if we show the intermediate or not.
If we hide B but want to show the sql for C, I would treat the code that generates it as a CTE and show the compiled sql for C using the code for B as an inlined CTE.
If we hide B and don't show the sql for C, it kinda doesn't matter. (edited)
Mike Weinberg (WeWork) [2 hours ago]
as for implementation preference, I polled 4 of our heaviest DBT users. All supported (as a preference) adding a
hide-from-docs
param to the<model(s)>.yaml
file but felt that it would be perfectly fine to put it in the sql as well. Their reason in favor of putting it in the model yaml was that it encourages analysts to document-the-undocumented models.From a developer perspective, I support putting it in the model yaml because it is more flexible - nested configuration options for the permutations you mentioned make more sense in something like yaml than as params in a macro. I think it's also more extensible.
As for what sql to show when B is hidden, the consensus was that the sql for C should be hidden too and we should point users to github, because if they are sophisticated enough to read complex sql, they are probably not going to be overly frustrated by being told to look at source code.
The text was updated successfully, but these errors were encountered: