Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default values for --exclude to dbt_profile.yml #2746

Closed
tjwaterman99 opened this issue Sep 10, 2020 · 4 comments
Closed

Add default values for --exclude to dbt_profile.yml #2746

tjwaterman99 opened this issue Sep 10, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@tjwaterman99
Copy link

tjwaterman99 commented Sep 10, 2020

Describe the feature

I'd like to be able to skip building some models by default in my project's development environment.

For example, my team has a set of models under models/tutorial that are used to help onboard new users into the project. However, those models are not useful for the production schema or our documentation, and we'd like to exclude them.

One way to do this would be to add a defaults section to the .dbt/profiles.yml configuration.

# ./ci/.dbt/profiles.yml

myclient:
  defaults:
    exclude: models/tutorial
    threads: 8
    target: staging
  outputs:
    staging: &base
      type: snowflake
      account: ...
      user: ci
      schema: staging
      password: "{{ env_var('DBT_CI_USER_PASSWORD') }}"
      role: transformer
      database: analytics
      warehouse: transforming
      client_session_keep_alive: False

    public: 
      <<: *base
      schema: public

The profiles.yml file already has default parameter selections, namely the --target value and number of --threads. This would extend those defaults to another parameter: --exclude.

This pattern is also popular in other packages, such as pytest's configuration for addopts.

Describe alternatives you've considered

  • Maintaining a separate dbt_project.yml file for development environments with a separate source-paths value. But this solution meant we would need to maintain the all of the other values across both files, such as any data in the vars section, which isn't DRY.
  • Building a wrapper around the dbt command, such as in bin/dbt, which did something like dbt --exclude tutorial $@. But that would be confusing for my team because they'd have to remember to use our custom bin/dbt command instead of the command referenced in all of DBT's documentation. I'm also not sure it would still allow them to use the --exclude flag themselves.

Who will this benefit?

This benefits teams that want to maintain a production environment that doesn't include all of the models in their development environments.

Are you interested in contributing this feature?

I would be happy to contribute! But this would be a large change to the profiles.yml file, so I wanted to see if others would benefit before seeing how far I could get.

@tjwaterman99 tjwaterman99 added enhancement New feature or request triage labels Sep 10, 2020
@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 10, 2020

@tjwaterman99 Thanks for the detailed writeup!

For example, my team has a set of models under models/tutorial that are used to help onboard new users into the project. However, those models are not useful for the production schema or our documentation, and we'd like to exclude them.

I'm glad to hear you're giving real thought to onboarding new dbt developers, and I want to be sure that there are dbt constructs to support you in doing this.

It sounds like you almost always don't want to be running the tutorial models. Is there a reason why you wouldn't want to:

  • Have those models live in a separate dbt project? If they build on top of resources in your main project, you could have the tutorial project install the main project as a package.
  • Disable those models by default? They could be reenabled by means of a var, e.g.:
models:
  tutorial:
    enabled: "{{ var('tutorial', false) | as_bool }}"
dbt run -m +tutorial --vars 'tutorial: true'

As far as the specific proposal to store a default set of node selectors in the profile, thereby altering the default behavior of dbt run, I hesitate for a few reasons.

I think that the configs in profiles.yml should fall into one of the following categories:

  • relating to the specifics mechanisms by which dbt connects to the database
  • likely to differ for each individual developer / deployment environment
  • values that should not be checked into version control (i.e. credentials)

And I think there are other better places to store these "shared settings":

  • YAML selectors (docs) are a mechanism, new in v0.18.0, for defining node selection groupings of arbitrary complexity, saving them in version control, and sharing them between dbt developers.
  • Today, if you want to codify a series of dbt commands with selectors, flags, etc. to share between developers and deployment environments, many folks use Makefiles (in lieu of a bin/dbt wrapper, though there's merit to that, too). In the future, we're thinking of adding a native dbt feature for workflows (Give dbt basic workflow capabilities #1842).

I'm curious to hear if any of the above resonates with you, or if you have other ideas!

@jtcohen6 jtcohen6 removed the triage label Sep 10, 2020
@tjwaterman99
Copy link
Author

Hey @jtcohen6, I think using the enabled flag is a great idea. I hadn't considered that, and it solves this really well.

I'll try that out and close this issue.

Thanks for your insight!

@ethanjahn
Copy link

@jtcohen6 I am in a similar situation to @tjwaterman99 and would like to use the enabled flag to handle a set of very expensive models that should only run in certain situations. For some reason, I can't seem to get the enabled solution to work.

It looks like the node selector is not able to recognize that the models have been enabled by the command line argument, so dbt run exits without actually running any models.

I am using dbt version 0.19.1 using dbt Cloud, the relevant part of my dbt_project.yml is:

models:
    company_name:
        pre_calculation:
            +materialized: table
            +enabled: "{{ var('tutorial', false) | as_bool }}"

And the command I am running is:

dbt run --models pre_calculation --vars 'tutorial: true'

This log then shows:

The selector 'pre_calculation' does not match any nodes and will be ignored
WARNING: Nothing to do. Try checking your model configs and model specification args

Is this solution still expected to work?

Also, in my case, creating a separate dbt project could make sense, but I would like to avoid setting up a new repository just for the 4 expensive models. I checked the docs and it seems possible to have multiple projects in a single repo, but not sure how to do it with the dbt Cloud IDE - any pointers on that?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 9, 2021

@ethanjahn Sorry for the delayed response here. I think what you're seeing is the result of a bug (https://github.com/fishtown-analytics/dbt/issues/3126), which unfortunately means that, within the RPC server (which the dbt Cloud IDE uses), --vars passed as a CLI-style argument are not appropriately accounted for in {{ var() }} calls within dbt_project.yml. I definitely want to fix this bug, as you're not the only person to feel its pain.

In the meantime, this should work if, instead of setting +enabled in dbt_project.yml, you set it within a config() block for each model you want disabled. I know that's significantly less than ideal.

As for the original thrust of this issue, I've come around on the need for changing the default include/exclude criteria, and I think yaml selectors may be just the way to do it. I just opened #3448 to continue that discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants