Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] dbt_vars raised "This can happen when calling a macro that does not exist" #1060

Open
1 task done
rightx2 opened this issue Jun 21, 2024 · 7 comments
Open
1 task done
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration bug Something isn't working dbt:deps Primarily related to dbt deps command or functionality execution:local Related to Local execution environment parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing triage-needed Items need to be reviewed / assigned to milestone

Comments

@rightx2
Copy link

rightx2 commented Jun 21, 2024

Astronomer Cosmos Version

Other Astronomer Cosmos version (please specify below)

If "Other Astronomer Cosmos version" selected, which one?

1.4.3

dbt-core version

1.7.16

Versions of dbt adapters

dbt-impala==1.4.3 (but i don't think this issue related with adapter)

LoadMode

DBT_LS

ExecutionMode

LOCAL

InvocationMode

None

airflow version

2.9.1

Operating System

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)"

If a you think it's an UI issue, what browsers are you seeing the problem on?

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened?

dbt_vars can raise "This can happen when calling a macro that does not exist"

Relevant log output

'data_interval_end' is undefined. This can happen when calling a macro that does not exist. Check for typos and/or install package dependencies with "dbt deps".

How to reproduce

I need to pass variable, start_dt, to dbt_models with data_interval_end macro of airflow. Below is my configuration for cosmos dag and it worked like a charm.

project_config = ProjectConfig(
    dbt_project_path="/opt/airflow/dbt/data-dbt",
    project_name="data_dbt",
    partial_parse=True,
    dbt_vars={
        "start_dt": '{{ data_interval_end }}',
    },
)
profile_config = ProfileConfig(
    profile_name="data_dbt",
    target_name="dev",
    profiles_yml_filepath=Path("/opt/airflow/dbt/data-dbt", "profiles", "profiles.yml"),
)
render_config = RenderConfig(
    load_method=LoadMode.DBT_LS,
    emit_datasets=False,
    test_behavior=TestBehavior.AFTER_EACH,
    select=[
        "models/my_model.sql",
    ],
)
execution_config = ExecutionConfig(
    execution_mode=ExecutionMode.LOCAL,
)
operator_args = {
    "install_deps": True,
}

dbt_op = DbtTaskGroup(
    group_id="dbt_group",
    project_config=project_config,
    profile_config=profile_config,
    render_config=render_config,
    execution_config=execution_config,
    operator_args=operator_args,
)

However, when I tried to call function of macro like below, it raised error:

project_config = ProjectConfig(
    dbt_project_path="/opt/airflow/dbt/data-dbt",
    project_name="data_dbt",
    partial_parse=True,
    dbt_vars={
        "start_dt": '{{ data_interval_end.strftime("%Y-%M-%d") }}',
    },
)

dbt model:

select *
from source("my_source")
where date_time >= '{{ var("start_dt") }}'

error:

  'data_interval_end' is undefined. This can happen when calling a macro that does not exist. Check for typos and/or install package dependencies with "dbt deps".

I think this is due to how yaml.dump works in here (I think using json.dumps will work...).
Is there any way I can pass variable to dbt_models with macro function?

Anything else :)?

written in above

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Contact Details

rightx2@gmail.com

@rightx2 rightx2 added bug Something isn't working triage-needed Items need to be reviewed / assigned to milestone labels Jun 21, 2024
@dosubot dosubot bot added area:config Related to configuration, like YAML files, environment variables, or executer configuration dbt:deps Primarily related to dbt deps command or functionality execution:local Related to Local execution environment parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing labels Jun 21, 2024
Copy link

dosubot bot commented Jun 21, 2024

Hello @rightx2! 👋 I'm here to help you with any bugs, questions, or contributions. Let me know how I can assist you!

The issue you're encountering with the data_interval_end variable being undefined when using dbt_vars in Cosmos might be related to how the variable is being templated and passed. The yaml.dump method is used to convert the dbt_vars dictionary to a YAML string, which might not handle Jinja templating as expected.

To address this, you can try using json.dumps instead of yaml.dump to ensure that the variables are correctly passed to dbt models. Here is an example of how you can modify your configuration:

import json
from cosmos.config import ProjectConfig, ProfileConfig, RenderConfig, ExecutionConfig, LoadMode, ExecutionMode
from cosmos.task_group import DbtTaskGroup

project_config = ProjectConfig(
    dbt_project_path="/opt/airflow/dbt/data-dbt",
    project_name="data_dbt",
    partial_parse=True,
    dbt_vars=json.dumps({
        "start_dt": '{{ data_interval_end }}',
    }),
)

profile_config = ProfileConfig(
    profile_name="data_dbt",
    target_name="dev",
    profiles_yml_filepath=Path("/opt/airflow/dbt/data-dbt", "profiles", "profiles.yml"),
)

render_config = RenderConfig(
    load_method=LoadMode.DBT_LS,
    emit_datasets=False,
    test_behavior=TestBehavior.AFTER_EACH,
    select=[
        "models/my_model.sql",
    ],
)

execution_config = ExecutionConfig(
    execution_mode=ExecutionMode.LOCAL,
)

operator_args = {
    "install_deps": True,
}

dbt_op = DbtTaskGroup(
    group_id="dbt_group",
    project_config=project_config,
    profile_config=profile_config,
    render_config=render_config,
    execution_config=execution_config,
    operator_args=operator_args,
)

This change should help ensure that the data_interval_end variable is correctly passed and templated within your dbt models. If the issue persists, it might be worth checking if there are any specific compatibility concerns between the versions of Cosmos and dbt-core you are using [1][2].

To continue talking to Dosu, mention @dosu.

@rightx2 rightx2 changed the title [Bug] dbt_vars can raise "This can happen when calling a macro that does not exist" [Bug] dbt_vars raised "This can happen when calling a macro that does not exist" Jun 21, 2024
@rightx2
Copy link
Author

rightx2 commented Jun 21, 2024

docs seems to say it worked... looks strange

@dwreeves
Copy link
Collaborator

dwreeves commented Jun 21, 2024

Really important question: Does this error occur in task execution or in DAG parsing/rendering?

I.e. is the error message on the homepage of Airflow and causes the DAG to not even load at all?

@dwreeves
Copy link
Collaborator

dwreeves commented Jun 21, 2024

I'm going ahead with the assumption that this occurs in DAG rendering and not task execution. The reason why is because I'm pretty confident I know what the problem is because I knew back in January that this problem would one day bite someone in the butt, and I have advocated for making changes to this precisely because of the issue you just ran into.

The problem:

Right now dbt vars + env are strongly coupled across both rendering and execution, but they should be looser because of precisely what you are attempting to do.

In Airflow, {{ templated_variables }} are not normally resolved until after a DagRun is initiated. So what happens is when your DagRun initiates and the task runs, {{ data_interval_end.strftime("%Y-%M-%d") }} becomes (for example) "2024-06-21".

During rendering of the DAG, Jinja2 is not used at all. This means that the string literal "{{ data_interval_end.strftime("%Y-%M-%d") }}" is passed to dbt. Because dbt uses Jinja, this means dbt is attempting to render it in its own Jinja2 environment, which doesn't have the same variables as Airflow's jinja environment.

The reason it doesn't raise an error when you do {{ data_interval_end }} is because Jinja2 by default will parse a variable not in the namespace as none. {{ asdfjkl123456789 }} (i.e. gibberish) will not raise an error in Jinja2. However, when you attempt to call a method of an un-namespaced variable, then this is where errors can occur. E.g. {{ fake_variable }} works but {{ fake_variable.fake_method() }} will raise an error.

How you can fix today:

You should look into using LoadMode.DBT_MANIFEST instead of LoadMode.DBT_LS.

How Cosmos can fix:

As per my comment in January, vars and the env should be allowed to be decoupled. Errors should not be raised when a user attempts to set vars.

@rightx2
Copy link
Author

rightx2 commented Jun 21, 2024

your assumption is right: it happend in rendering time. And the reason of the problem I was thinking about matches exactly with what you mentioned.. I think I'd take another render method. Thanks

@dwreeves
Copy link
Collaborator

One more note I didn't mention is that your use case is not atypical. I think injecting DagRun variables like the data interval end should be supported. It's very natural to want to do that. And it clearly is not supported right now. I think we should make this a more explicitly supported pattern. So keep doing what you're doing and don't be discouraged!

@rightx2
Copy link
Author

rightx2 commented Jun 22, 2024

of course i will : )

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 21, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 28, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 28, 2024
@tatiana tatiana reopened this Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration bug Something isn't working dbt:deps Primarily related to dbt deps command or functionality execution:local Related to Local execution environment parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing triage-needed Items need to be reviewed / assigned to milestone
Projects
None yet
Development

No branches or pull requests

3 participants