Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cross-project ref + model access + dependencies.yml #3577

Merged
merged 16 commits into from
Jul 13, 2023
Merged

Conversation

jtcohen6
Copy link
Collaborator

@jtcohen6 jtcohen6 commented Jun 20, 2023

resolves #3550
resolves #3632
resolves #3574

What are you changing in this pull request and why?

Create a new page for "cross-project ref under collaborate > govern. I've decided to call the page "Project dependencies," and use it as an opportunity to highlight the differences between project + package dependencies.

I started tackling two closely related issues, since we should be thematically consistent across all of them:

  • enforce_access for packages <> model access
  • packages can be configured in a file named dependencies.yml

Previews

Checklist

  • Add versioning components, as described in Versioning Docs
  • Add a note to the prerelease version Migration Guide
  • Review the Content style guide and About versioning so my content adheres to these guidelines.
  • Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch."

Adding new pages (delete if not applicable):

  • Add page to website/sidebars.js
  • Provide a unique filename for the new page

@netlify
Copy link

netlify bot commented Jun 20, 2023

Deploy Preview for docs-getdbt-com ready!

Name Link
🔨 Latest commit 8ecefc0
🔍 Latest deploy log https://app.netlify.com/sites/docs-getdbt-com/deploys/64b0751ac8cda600073bfa18
😎 Deploy Preview https://deploy-preview-3577--docs-getdbt-com.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@github-actions github-actions bot added content Improvements or additions to content guides Knowledge best suited for Guides size: medium This change will take up to a week to address labels Jun 20, 2023
@jtcohen6 jtcohen6 marked this pull request as ready for review July 4, 2023 10:50
@jtcohen6 jtcohen6 requested a review from a team as a code owner July 4, 2023 10:50
@jtcohen6 jtcohen6 changed the title [Draft] dependencies.yml, cross-project ref, and model access across projects cross-project ref + model access + dependencies.yml Jul 4, 2023

It is possible to `ref` a model from another project in two ways:
1. As a "project" dependency, via "cross-project `ref`" (a feature of dbt Cloud Enterprise)
2. As a "package" dependency, whereby all source code from that project is installed into your own
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we just need to finish this thought - own project?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely meant to write your own project here — although we're using project to mean so many different things. It's almost more like:

  • your own working directory
  • your own environment
  • your own dbt runtime
  • ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like environment!


dbt Core v1.6 introduces a notion of `dependencies` between dbt projects. You might already be familiar with installing other projects as [packages](/docs/build/packages), whereby you pull down another project's source code and treat it as your own.

There is a new kind of `project` dependency. Both dependencies can be defined in `dependencies.yml`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there is gap between the opening sentences and this one. Is there some more information we want to share to help users connect the dots here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writing more here!

@jtcohen6
Copy link
Collaborator Author

jtcohen6 commented Jul 7, 2023

@MichelleArk @matthewshaver Thanks for the reviews! I'm going to give this another draft before merging, specifically for how we're positioning XP ref vis-à-vis packages


Models referenced from a `project`-type dependency must use [two-argument `ref`](/reference/dbt-jinja-functions/ref#two-argument-variant), including the project name. Only public models can be accessed in this way.

It is equally possible to install the `jaffle_finance` project as a `package` dependency. This will pull down its full source code and require dbt to parse all its contents. dbt will expect you to configure and run those models as your own. This can be a useful pattern to achieve certain types of unified deployments in production or to make a coordinated change across multiple projects in development. However, it can add significant complexity and latency when working within the narrower scope of a single project.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it can add significant complexity and latency when working within the narrower scope of a single project.

and cost!

If I were to instead install the `jaffle_finance` project as a `package` dependency, this would pull down its full source code. Meaning:
- dbt needs to parse and resolve more inputs (which is slower)
- dbt expects you to configure these models as if they were your own (with `vars`, env vars, etc)
- dbt will run these models as my own, unless I explicitly `--exclude` them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- dbt will run these models as my own, unless I explicitly `--exclude` them
- dbt will run these models as your own unless you explicitly `--exclude` them

- dbt needs to parse and resolve more inputs (which is slower)
- dbt expects you to configure these models as if they were your own (with `vars`, env vars, etc)
- dbt will run these models as my own, unless I explicitly `--exclude` them
- I could be using the project's models in a way that their maintainer (the Finance team) hasn't intended
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- I could be using the project's models in a way that their maintainer (the Finance team) hasn't intended
- You could be using the project's models in a way that their maintainer (the Finance team) hasn't intended

```

</File>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Package use case

The first is familiar: I want to use some of the macros that are already defined in `dbt_utils`. I pull down its full contents (100+ macros) as source code, and add them to my environment. I can then call any macro from the package, just as I can call macros that I define in my own project.

The second is new. Unlike installing a package, the models in the `jaffle_finance` project will not be pulled down as source code, and parsed into my project. Instead, dbt Cloud provides a metadata service that resolves references to [**public models**](/docs/collaborate/govern/model-access) defined in the `jaffle_finance` project.

Copy link
Contributor

@mirnawong1 mirnawong1 Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Projects use case

- I could be using the project's models in a way that their maintainer (the Finance team) hasn't intended

Installing another internal project as a package can be a useful pattern for:
- "Unified deployments" in production environments. If the central data platform team of Jaffle Shop wanted to schedule the deployment of models across both `jaffle_finance` and `jaffle_marketing`, using dbt's selection syntax, they could create a new "passthrough" project that installed both projects as packages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- "Unified deployments" in production environments. If the central data platform team of Jaffle Shop wanted to schedule the deployment of models across both `jaffle_finance` and `jaffle_marketing`, using dbt's selection syntax, they could create a new "passthrough" project that installed both projects as packages.
- Unified deployments &mdash; In a production environment, if the central data platform team of Jaffle Shop wanted to schedule the deployment of models across both `jaffle_finance` and `jaffle_marketing`, they could use dbt's [selection syntax](/reference/node-selection/syntax) to create a new "passthrough" project that installed both projects as packages.


Installing another internal project as a package can be a useful pattern for:
- "Unified deployments" in production environments. If the central data platform team of Jaffle Shop wanted to schedule the deployment of models across both `jaffle_finance` and `jaffle_marketing`, using dbt's selection syntax, they could create a new "passthrough" project that installed both projects as packages.
- Making a coordinated change across multiple projects in development. If I wanted to test the effects of a change to a public model in an upstream project (`jaffle_finance.monthly_revenue`), and see how it impacts a downstream model (`jaffle_marketing.roi_by_channel`), before pushing the changes into a staging or production environment, I could install `jaffle_finance` as a package inside `jaffle_marketing`, pointing to a specific git branch. (If I find I regularly need to do end-to-end testing of changes across both projects, I should reexamine if this really represents a stable interface boundary.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Making a coordinated change across multiple projects in development. If I wanted to test the effects of a change to a public model in an upstream project (`jaffle_finance.monthly_revenue`), and see how it impacts a downstream model (`jaffle_marketing.roi_by_channel`), before pushing the changes into a staging or production environment, I could install `jaffle_finance` as a package inside `jaffle_marketing`, pointing to a specific git branch. (If I find I regularly need to do end-to-end testing of changes across both projects, I should reexamine if this really represents a stable interface boundary.)
- Coordinated changes &mdash; In development, if you wanted to test the effects of a change to a public model in an upstream project (`jaffle_finance.monthly_revenue`) on a downstream model (`jaffle_marketing.roi_by_channel`) _before_ introducing changes to a staging or production environment, you can install the `jaffle_finance` package as a package within `jaffle_marketing`. The installation can point to a specific git branch, however, if you find yourself frequently needing to perform end-to-end testing across both projects, we recommend you re-examine if this represents a stable interface boundary.

@@ -36,6 +36,8 @@ dbt Labs is committed to providing backward compatibility for all versions 1.x,

[**Namespacing:**](/faqs/Models/unique-model-names) Model names can be duplicated across different namespaces (packages/projects), so long as they are unique within each package/project. We strongly encourage using [two-argument `ref`](/reference/dbt-jinja-functions/ref#two-argument-variant) when referencing a model from a different package/project.

[**Project dependencies**](/docs/collaborate/govern/project-dependencies): Introduce `dependencies.yml`. Allow enforcing model access (public vs. protected/private) across project/package boundaries. Enable cross-project `ref` of public models, without requiring the installation of upstream source code, as a feature of dbt Cloud Enterprise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[**Project dependencies**](/docs/collaborate/govern/project-dependencies): Introduce `dependencies.yml`. Allow enforcing model access (public vs. protected/private) across project/package boundaries. Enable cross-project `ref` of public models, without requiring the installation of upstream source code, as a feature of dbt Cloud Enterprise.
[**Project dependencies**](/docs/collaborate/govern/project-dependencies): Introduces `dependencies.yml` as a feature of dbt Cloud Enterprise. Allows enforcing model access (public vs. protected/private) across project/package boundaries. Enables cross-project `ref` of public models, without requiring the installation of upstream source code.

Copy link
Contributor

@mirnawong1 mirnawong1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some suggested changes here @jtcohen6 to tighten it up for the reader, let me know if you have any questions at all!

@jtcohen6
Copy link
Collaborator Author

@mirnawong1 fantastic edits & suggestions!! thank you thank you thank you

@mirnawong1
Copy link
Contributor

it was my absolute pleasure @jtcohen6 , thank you for the opportunity! this is all very exciting for users so i can't wait!

@jtcohen6 jtcohen6 merged commit b62768c into current Jul 13, 2023
@jtcohen6 jtcohen6 deleted the jerco/xp-ref branch July 13, 2023 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content guides Knowledge best suited for Guides size: medium This change will take up to a week to address
Projects
None yet
4 participants