Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support for Model-Specific dbt_vars in DbtTaskGroup #1497

Closed
1 task
ame589 opened this issue Jan 31, 2025 · 3 comments
Closed
1 task

[Feature] Support for Model-Specific dbt_vars in DbtTaskGroup #1497

ame589 opened this issue Jan 31, 2025 · 3 comments
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration enhancement New feature or request triage-needed Items need to be reviewed / assigned to milestone

Comments

@ame589
Copy link

ame589 commented Jan 31, 2025

Description

Currently, when creating a DbtTaskGroup that points to a folder (e.g., using a select path like +path:models/silver/{group_name}), it is only possible to pass shared dbt_vars to all models in that folder via the operator_args parameter or dbt_vars inside ProjectConfig (depracated).
This limitation makes it challenging to handle scenarios where each model within the folder requires different variables.
For example, consider the following code snippet:

project_config = ProjectConfig(
    dbt_project_path=dbt_project_path,
    dbt_vars={"enable_hooks": True, "execution_date": current_date_str}
)

path_silver = f"+path:models/silver/{group_name}"
if not seed:
    return DbtTaskGroup(
        group_id=f"{group_name}",
        project_config=project_config,
        profile_config=profile_config,
        execution_config=execution_config,
        render_config=RenderConfig(
            select=[f"{path_silver}"],  # Select the folder containing the models
            exclude=exclude,  # Exclude dependencies if provided
            test_behavior=TestBehavior.AFTER_EACH,
            enable_mock_profile=False,
            emit_datasets=False
        ),
        default_args={"retries": 0, "trigger_rule": "all_success"},
        operator_args={
            "install_deps": True
        }
    )

In this case, all models within the models/silver/{group_name} folder will share the same dbt_vars. However, I would like to pass a unique variable (e.g., the output of an upstream task like DatabricksRunNowOperator) to each model. This is currently not possible without creating separate task groups for each model or pre-processing variables externally.

Use case/motivation

In our workflows, we often have multiple models grouped under the same folder, but each model represents a different flow or use case. For example:
Model A requires the output of an upstream Databricks task specific to customer data.
Model B requires the output of an upstream Databricks task specific to product data.
Currently, there is no straightforward way to pass these unique variables directly to each model within a DbtTaskGroup. The only workaround is to create separate task groups for each model or handle variables externally, which adds complexity and reduces maintainability.
Adding support for model-specific dbt_vars within a single DbtTaskGroup would:
Simplify orchestration by allowing dynamic variable assignment per model.
Improve flexibility and usability for complex dbt projects.
Align with dbt's ability to handle per-model configurations via CLI arguments or YAML files.

Related issues

I am not aware of any existing issues directly addressing this feature request. However, this enhancement would align with Cosmos's goal of simplifying dbt orchestration in Airflow.

Are you willing to submit a PR?

  • Yes, I am willing to submit a PR!
@ame589 ame589 added enhancement New feature or request triage-needed Items need to be reviewed / assigned to milestone labels Jan 31, 2025
@dosubot dosubot bot added the area:config Related to configuration, like YAML files, environment variables, or executer configuration label Jan 31, 2025
@tatiana
Copy link
Collaborator

tatiana commented Jan 31, 2025

Hi @ame589 ,

Thanks for reporting this feature request.

Cosmos may already support what you'd like to accomplish.

In version 1.8.0, @wornjs introduced support to customizing Airflow operator arguments per dbt node via the PR #1339 - more information here.

More recently, we made a few improvements to this feature, which is available in 1.9.0a4, as part of the PR #1492. The main improvement is ensuring that operator arguments defined at the model level take precedence over those set at a higher level.

This documentation summarises the existing feature
https://astronomer.github.io/astronomer-cosmos/configuration/operator-args.html#operator-args-per-node

Given that Cosmos exposes the argument "var" at the operator levels to represent dbt variables:

:param vars: dbt optional argument - Supply variables to the project. This argument overrides variables
defined in your dbt_project.yml file. This argument should be a YAML
string, eg. '{my_variable: my_value}' (templated)

You probably can accomplish what you want by using something like the following in your dbt_project.yml:

version: 2
    models:
      - name: some-model
        description: description
        meta:
          cosmos:
            operator_kwargs:
              vars:
                var_name: var_value

If this does not work with Cosmos 1.8.0, please try it out with 1.9.0a4. Although I haven't tested, I'm optimistic this will solve the problem, so I'm closing the issue for now. Please feel free to reopen it if it does not meet your needs.

@tatiana tatiana closed this as completed Jan 31, 2025
@ame589
Copy link
Author

ame589 commented Feb 3, 2025

Hi @tatiana,

unfortunately with your approach, using version 1.6.0, we receive:

2025-02-03, 15:16:29 UTC] {logging_mixin.py:188} INFO - 15:16:29 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources. There are 1 unused configuration paths: - models.staging.folder_model_1.model_name.meta.cosmos.operator_kwargs.vars

We have set in this way the dbt_project.yml:

models:
  dbt_artifacts:
    +database: it_gl
    +schema: service
  project_name:
    staging:
      folder_model_1:
        model_name:
          meta:
            cosmos:
              operator_kwargs:
                vars:
                  job_run_id: 1234

@tatiana
Copy link
Collaborator

tatiana commented Feb 3, 2025

@ame589 this warning message in dbt is letting you know that there's a configuration path in your dbt_project.yml file that does not apply to any model. It is not an issue with the approach in using Cosmos but with your reference to the dbt model itself.

Could you try the following configuration in your dbt_project_yaml, by replacing model_name with your model name?

version: 2

models:
  - name: model_name
    meta:
      cosmos:
        operator_kwargs:
          vars:
            job_run_id: 1234

You can use

dbt debug

To confirm if the configuration is valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration enhancement New feature or request triage-needed Items need to be reviewed / assigned to milestone
Projects
None yet
Development

No branches or pull requests

2 participants