Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise exception when attempting to run a model that dbt can't find #682

Closed
tatiana opened this issue Nov 16, 2023 · 5 comments
Closed

Raise exception when attempting to run a model that dbt can't find #682

tatiana opened this issue Nov 16, 2023 · 5 comments
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:selector Related to selector, like DAG selector, DBT selector, etc dbt:run Primarily related to dbt run command or functionality execution:local Related to Local execution environment priority:medium Medium priority issues are important issues that may have a workaround and medium impact

Comments

@tatiana
Copy link
Collaborator

tatiana commented Nov 16, 2023

Context

Cosmos 1.2 considers a model run task successful even if dbt finds no models for that execution, as illustrated in:
#662 (comment)

The Airflow task succeeds, but the logs say:

[2023-11-16T12:32:37.420+0000] {subprocess.py:94} INFO - (astronomer-cosmos) - 12:32:37  The selection criterion 'customers_USA_v1' does not match any nodes

I'd expect us to be able to reproduce this by instantiating a DbtRunLocalOperator https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/operators/local.py#L433C7-L433C26 with a select statement that doesn't match any dbt models.

Proposal

(I would love thoughts from the community!)

We should raise an exception if dbt's output contains "select statement that doesn't match any dbt models".
We could allow users to opt out of this behaviour via some configuration.

Copy link

dosubot bot commented Nov 16, 2023

🤖

Hello again, @tatiana! It's good to see you contributing to our project. I appreciate the detailed issue you've raised. Rest assured, I'm going to take a close look at it and get back to you with a comprehensive response soon.

@dosubot dosubot bot added area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:selector Related to selector, like DAG selector, DBT selector, etc dbt:run Primarily related to dbt run command or functionality execution:local Related to Local execution environment priority:medium Medium priority issues are important issues that may have a workaround and medium impact labels Nov 16, 2023
Copy link

dosubot bot commented Nov 16, 2023

🤖

Thank you for bringing this to our attention, Tatiana. I agree that it would be beneficial for the DbtRunLocalOperator to raise an exception when dbt can't find a model that matches the select statement.

Currently, the DbtRunLocalOperator does not specifically handle this case. It executes the dbt command and logs the output, but it does not parse the output to check for a message indicating that no models matched the select statement.

To implement your proposal, we could modify the run_command method in the DbtRunLocalOperator to parse the output of the dbt command and raise an exception if it contains the message "The selection criterion '...' does not match any nodes". Here is a rough sketch of how this could be done:

def run_command(self, command: List[str]) -> None:
    result = run_dbt_command(command, self.env)
    if "The selection criterion" in result.output and "does not match any nodes" in result.output:
        raise AirflowException("The selection criterion does not match any dbt models")
    self.log.info(result.output)

This would cause the Airflow task to fail if dbt can't find a model that matches the select statement.

As for the configuration option to opt out of this behavior, we could add a new parameter to the DbtRunLocalOperator called raise_on_no_match. If raise_on_no_match is set to True, the operator would raise an exception when no models match the select statement. If raise_on_no_match is set to False, the operator would log a warning message but not raise an exception. Here is how this could be implemented:

def run_command(self, command: List[str]) -> None:
    result = run_dbt_command(command, self.env)
    if "The selection criterion" in result.output and "does not match any nodes" in result.output:
        if self.raise_on_no_match:
            raise AirflowException("The selection criterion does not match any dbt models")
        else:
            self.log.warning("The selection criterion does not match any dbt models")
    self.log.info(result.output)

Please note that this is a rough sketch and the actual implementation might differ. The changes would need to be thoroughly tested to ensure they work as expected and do not introduce any regressions.

I hope this helps! If you have any further questions or suggestions, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@MrBones757
Copy link
Contributor

What does DBT do in the case it doesn't match?

If DBT fails in this situation i would say we should fail as a default and allow the user to turn it off. Likewise if DBT doesn't fail we should follow suit and add a toggle to enable failing.

@tatiana
Copy link
Collaborator Author

tatiana commented Nov 16, 2023

@MrBones757 I really like your approach to this - but I feel in this particular case, things are not as clear.

dbt ls, for instance, doesn't return an error (return code) even when there are errors (such as missing dependencies!) - it merely outputs the error on the standard output.

Since dbt Core users use the command line, they always see the output - regardless of the OS return code /stderr, so the error is visible. I don't have access to dbtCloud, but I'd expect them to be exposed in the UI if a model expected to run did not run.

In the case of Cosmos/Airflow, we need to make a call of which scenarios represent a task failure - since we cannot rely on dbt Core return codes or the stderr.

Copy link

dosubot bot commented Feb 15, 2024

Hi, @tatiana,

I'm helping the Cosmos team manage their backlog and am marking this issue as stale. From what I understand, the issue you raised proposed raising an exception when attempting to run a model that dbt cannot find, with the ability for users to opt out of this behavior through configuration. The issue has been resolved by modifying the DbtRunLocalOperator to parse the output of the dbt command and raise an exception if no models are found. Additionally, a configuration option has been added to allow users to opt out of this behavior. The default behavior is now to fail if dbt fails to find a model, with an option to enable failing if dbt does not fail. This resolution addresses the concerns raised by you and ensures consistency in error visibility across different platforms.

Could you please confirm if this issue is still relevant to the latest version of the Cosmos repository? If it is, please let the Cosmos team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your contribution! If you have any further questions or need assistance with anything else, feel free to reach out.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 15, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 22, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:selector Related to selector, like DAG selector, DBT selector, etc dbt:run Primarily related to dbt run command or functionality execution:local Related to Local execution environment priority:medium Medium priority issues are important issues that may have a workaround and medium impact
Projects
None yet
Development

No branches or pull requests

2 participants