Replies: 4 comments 3 replies
-
Updates after a few months of implementation work! Docs (beta!): https://docs.getdbt.com/docs/collaborate/publish/model-access
|
Beta Was this translation helpful? Give feedback.
-
@jtcohen6 I would like to allow to access some models from certain groups. As for me, the protected type would be broad to control data access in a large project. To do so, there a couple of ways to realize that. First, we may be able to composite group. A group can contain the sub groups. Models in a sub group can access private models in another sub group in the same parent group. Second, we can also implicitly declare what other group can access private models in a group. The subsequent image is what I want to do. We assume if we have four groups in a dbt project here. Some models managed by If we will support multiple project deployments in the near future, we can also split a large dbt project so that we take advantage of the protected type. Besides, it might be related to introduce namespace of dbt models. |
Beta Was this translation helpful? Give feedback.
-
It seems to me that groups can only be declared at the root of a models.yml file. If we can assign a group to a folder (for all models inside it) from the project file. Wouldn't it make more sense that we could declare groups in the project file also? Or am I missing something? |
Beta Was this translation helpful? Give feedback.
-
We tried to implement groups and access in our single dbt project. I was originally thinking we'd use them to distinguish between models that are meant to be consumed by data mart owners vs models that are just intermediates in the data pipeline. For example, I wanted to defined one "main" group and multiple "mart" groups. The "main" group would contain all of our staging models and the final kimball-lite dimensional models, which are maintained by the core analytics engineering team. The "mart" models contain various aggregations and joins of the "main" models but shouldn't usually reference staging models, and are maintained by specific teams. Therefore, I wanted to make all of my staging models private and all of the main models public. I was hoping to write a
Unfortunately, this doesn't work because I can't define access in dbt_project.yml. I would have to ensure that each staging model had I get the motivation that defining any model as public should be deliberate (although maybe we should be able to make that mistake). However, having the default access as protected instead of private makes this feature not particularly useful for single-project setups. |
Beta Was this translation helpful? Give feedback.
-
Part of the larger initiative for Multi-project collaboration (#6725)
A law of nature: DAGs get messier as they get bigger. We'd like to provide constructs that make it easier for dbt project developers and maintainers to manage, and reason about, large DAGs.
There are two big ideas at play here:
By developing constructs for groups and access within one project, we aim to provide mechanisms for scaling monoliths more gracefully. At the same time, we believe these same capabilities will extend to deployments that span multiple projects—whether monorepo or polyrepo.
Proposals
dbt developers can mark a model as "public."
This is an enum attribute:
access: public | private
. (In the future, there could be additional options. We're taking inspiration by access modifiers in object-oriented programming languages. If this is your first time hearing these terms, that's totally okay: inspiration != prerequisite.)This should be a model-level attribute, not a configuration set & inherited for many models at once in
dbt_project.yml
. Every public model must be consciously and individually marked as such. This adds a teensy bit of friction, with the aim of ensuring intentionality.Our initial & primary intent for
access
control is models. I expect readers of this discussion, and dbt developers, to spend 90% of their time thinking about access control for models. As with every new feature, though, we need to ask: What about all the other resource types?dbt developers can define groups.
A group may contain models ("public" & "private"), seeds, snapshots, tests, analyses,
exposures, entities, metrics.source
they belong to.Update: In the first cut, exposures cannot belong to groups, and can't reference private models at all. They can define an
owner
, which could be the same as theowner
of a resource group.A model's
group
should be configured explicitly.group
indbt_project.yml
, e.g. for all models in a subdirectory?Groups can be selected (
group:
), similar to thetag:
orfqn:
selection methods.Groups must define an
owner
(dict<name: str, email: str, …>
), which then applies to objects within the group.loader
field insources
, and equivalent to theowner
property already defined forexposures
.owner
metadata in auto-generated project documentation, rather than what we show currently for models (the database user that created the model’s physical table).owner
field should appear innode_info
, for relevant events & structured logs, which could eventually enable more granular notifications.This could look like:
Groups are not intended as a mechanism for model namespacing. Resource names must still be globally unique within one project. (In a future with multi-project deployments, we should support multiple models with the same name, so long as they're defined in separate projects. That will mean finally tackling a longtime limitation: #1269.)
Only public models can be referenced outside their group.
In other words, private models can be
ref
'd within the samegroup
, but they cannot beref
'd by resources outside of theirgroup
. This enables cleaner dependency chains, with fewer interwoven arrows.What about models not in a group? My sense is, they should be neither public nor private. They can be
ref
'd elsewhere, and they also aren't held to the minimum standard for all public models in a project (see below). Motivation: Preserve status quo, and avoid creating lots of tech debt for existing projects.As soon as a model is added to a group, it becomes private, until explicitly marked public.
More devilish details:
ref
call to a private model in a different group should raise an error:Model 'model.my_project.my_model' depends on a node named 'private_model’, which is private in a different group.
int_payments_aggregated
belongs to the group "finance," aunique
test on that private model also belongs to the "finance" group, and is allowed toref
the model it is testing. However, to define arelationships
test between two private models, they do actually need to be in the same group!ref
, only models explicitly marked public can be "exported" and referenced from other projects. Aref
call to a private model in a different project should raise an error:Model 'model.my_project.my_model' depends on a node named 'other_project.private_model’ which was not found
Public models ought to be "contracted," with a reliable (minimum) set of guarantees.
For more on model contracts, see #6726
We should implement a sensible & opinionated default. In my opinion:
description
.unique
test (= validated primary key)Users may optionally define their own set of expectations, overriding the default, that would be checked against every public model in the project.
.CODEOWNERS
rules, e.g. to require reviews from repository maintainers any time these expectations are updated.)persist_docs
enabled (for integration with an external data catalog), is materialized as aview
(on top of an underlying private table), and has at least a certain number of data quality tests. Imagine something like:For totally custom & complex validation logic (e.g. "every column named
email
should have a BigQuery policy tag, a dbtpii
tag, and adescription
containing the word 'pseudonymized'"), these rules could, as they can today, be written in:dbt_project_evaluator
anddbt_meta_testing
)manifest.json
)Groups can be visualized
One of the biggest eventual benefits of sorting models into groups, slowly but surely, is enabling users to make visual sense of large & complex DAGs.
Our team has not been able to meaningfully invest in new features for
dbt-docs
. While I don't anticipate that changing, there are a few low-effort & high-value additions we may want to shoot for:group
. (Even if we can't "roll up" to groups, or demarcate multiple groups simultaneously, just viewing all resources in one group is a good start.)access
rule (public or private).name
).[Bonus content]
[Future] More ergonomic configuration?
(This "paper cut" issue has a lot of upvotes, and is never far from my mind!)
We might want to add
groups
as a new rung in the configuration ladder. If users could set some group-level config that cascades down to member models, they would be able to move some of that config out ofdbt_project.yml
. This does risk additional confusion, though, in trying to figure out where a model's configuration is coming from. We could always add this later.[Aside] What's the distinction between a "public model" and an "entity"?
If you've been following the discussion about adding more semantic information to dbt (#6644), and the proposal for entities as a new node type, you might be wondering: What makes a model worthy to be "public," and what makes it worthy to power an entity?
This is a subtle distinction! I expect many
entities
to be built on top of public models, and to leverage those public models' contracted metadata (column names + data types) as a way to provide richer dimensions for semantic queries. We had several conversations about whether these ought to be one & the same, but ultimately decided that they deserve to be two separate constructs. Why?Entities represent the canonical representation of a business concept (customers, orders) for downstream querying ("semantic"). Public models represent a logical dataset with a set of guarantees and clear ownership. If you'll indulge me in a metaphor: Database tables are raw materials; models are the means of logical production; public models are finished goods; entities are the packaging (declared interface) for those goods; metrics their clearly defined directions for intended use.
Beta Was this translation helpful? Give feedback.
All reactions