Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Airflow DAG topology when test depends on multiple models #620

Closed
tatiana opened this issue Oct 23, 2023 · 2 comments
Closed

Fix Airflow DAG topology when test depends on multiple models #620

tatiana opened this issue Oct 23, 2023 · 2 comments
Labels
area:rendering Related to rendering, like Jinja, Airflow tasks, etc priority:high High priority issues are blocking or critical issues without a workaround and large impact
Milestone

Comments

@tatiana
Copy link
Collaborator

tatiana commented Oct 23, 2023

Example DAGs:

  • model1 -> model2, and a test rc_test set up to test row-count being equal between model1 and model2
  • rc_test depends on two models (model1 and model2) which are not dependent upon one another

Which would make sense from a DAG topology perspective?
i) To have a single Airflow test task depending on both models' Airflow run both model tasks.
ii) To have two Airflow test nodes - one running for each model run task - but one may not be running all the tests for that given model, but it will still be considered a successful Airflow task, even if that was the only test being run for that specific model.
iii) To have three Airflow test nodes - one running tests that are exclusive to model1 (we could continue using the TaskGroup approach we have, in this sense), other running tests exclusive to model2, and a third Airflow task running test that depends on both models.

This is a follow-up to conversations that started in ticket #613 - and we still need to confirm the desired DAG topology. I believe that (iii) is the desired approach - but we can discuss the solution here before any implementations take place.

@tatiana tatiana added the priority:high High priority issues are blocking or critical issues without a workaround and large impact label Oct 23, 2023
@tatiana tatiana added this to the 1.3.0 milestone Oct 23, 2023
@david-mag
Copy link
Contributor

david-mag commented Oct 25, 2023

If somebody looks into the DAG topology in general, maybe one can also deal with ephemeral models and their impact on the DAG. Generally ephemeral models never need to be run as a task, since they don´t do anything. When removing them with an --exclude statement in cosmos, however, the DAG might get disconnected in places.

@tatiana tatiana added the area:rendering Related to rendering, like Jinja, Airflow tasks, etc label Nov 8, 2023
@tatiana tatiana modified the milestones: 1.3.0, 1.4.0 Dec 7, 2023
@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 9, 2024
Copy link

dosubot bot commented Mar 9, 2024

Hi, @tatiana,

I'm helping the Cosmos team manage their backlog and am marking this issue as stale. From what I understand, you raised the issue to discuss the desired DAG topology for Airflow when a test depends on multiple models. Three options were proposed, and confirmation on the preferred approach was sought. In a recent comment, David-Mag suggests considering the impact of ephemeral models on the DAG topology and their potential disconnection when removed with an --exclude statement in cosmos.

Could you please confirm if this issue is still relevant to the latest version of the Cosmos repository? If it is, please let the Cosmos team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you!

@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 16, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rendering Related to rendering, like Jinja, Airflow tasks, etc priority:high High priority issues are blocking or critical issues without a workaround and large impact
Projects
None yet
Development

No branches or pull requests

2 participants