Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some of the cosmos DAGs not loading #1391

Open
1 task done
vemulagopal opened this issue Dec 16, 2024 · 3 comments
Open
1 task done

Some of the cosmos DAGs not loading #1391

vemulagopal opened this issue Dec 16, 2024 · 3 comments
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:performance Related to performance, like memory usage, CPU usage, speed, etc bug Something isn't working triage-needed Items need to be reviewed / assigned to milestone

Comments

@vemulagopal
Copy link

Astronomer Cosmos Version

1.6.0

dbt-core version

1.6.6

Versions of dbt adapters

dbt-snowflake==1.6.4

LoadMode

AUTOMATIC

ExecutionMode

DOCKER

InvocationMode

SUBPROCESS

airflow version

2.8.1

Operating System

Ubuntu 22.04.1 LTS (Jammy Jellyfish)

If a you think it's an UI issue, what browsers are you seeing the problem on?

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened?

A bug happened! I have an issue with cosmos, where few of the cosmos dags are not loading in my local , when i checked the logs DAG processor is getting killed with exit code -9 , This is happening for the cosmos dags with huge number of tasks. I increased the docker engine virtual machine (where airflow is running ) memory to 24gb and with 4 cpus, but still the processor is exiting. Would you please help me out to fix the issue.

{manager.py:1017} ERROR - Processor for /opt/airflow/dags/test_cosmos_dag.py exited with return code -9. and
Here are my configurations.

scheduler:
min_file_process_interval: 300
core:
dagbag_import_timeout: 1600
dag_file_processor_timeout: 1450

Relevant log output

How to reproduce

  1. Have a airflow container with version 2.8.1
  2. Install astronomer-cosmos==1.6.0 in the continaer
  3. Rub the container and deploy the cosmos dags

Anything else :)?

Few of the DAGs are working fine and looks some complex DAGs are not loading , i tried to increase the memory and tried with different configuration parameters and nothing worked.

Please let us know if needed any call?

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Contact Details

vemulagopal@gmail.com

@vemulagopal vemulagopal added bug Something isn't working triage-needed Items need to be reviewed / assigned to milestone labels Dec 16, 2024
Copy link

dosubot bot commented Dec 16, 2024

I found a similar open issue regarding Cosmos DAGs not loading, where the DAG processor is getting killed with exit code -9. The suggested solutions include:

  1. Optimize DAG Processing: Use Cosmos's caching and partial parsing features to optimize DAG processing. Ensure caching is enabled and configured properly in your airflow.cfg or through environment variables.

  2. Increase Memory Allocation: Although you've already increased the memory to 24GB, consider further increasing it if possible, as large DAGs can be very memory-intensive.

  3. Adjust Airflow Configurations: Review and adjust your Airflow configurations. You've already increased dag_file_processor_timeout and dagbag_import_timeout, which is good. Consider looking into other settings like parallelism and max_active_tasks_per_dag.

  4. Profile and Monitor Resource Usage: Use monitoring tools to profile the memory usage of your Airflow instance to identify specific tasks or operations consuming excessive resources.

  5. Review DAG Complexity: Simplify the DAGs or break them into smaller, more manageable pieces if possible [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:performance Related to performance, like memory usage, CPU usage, speed, etc labels Dec 16, 2024
@tatiana
Copy link
Collaborator

tatiana commented Dec 16, 2024

@vemulagopal Please, could you confirm the following:

  1. How large is your dbt project (how many nodes)?
  2. If you run dbt ls in the dbt project folder in your Airflow scheduler host, how long does it take?
  3. How many Airflow DAGs do you have referencing this dbt project?
  4. What are the values of the following configuration:
    • [core] dagbag_import_timeout
    • [core] dag_file_processor_timeout
    • [scheduler] min_file_process_interval

@vemulagopal
Copy link
Author

@tatiana , Here are the answers below for your questions. basically i am testing with 3 cosmos dags and two of them are rendering the lineage without any issues , but the third one which is complex dag (more than 30 tasks) is disappearing. Please let me know if you need more information.

  1. How large is your dbt project (how many nodes)?
    Ans: Our DBT work loads runs on aws batch , so when the job runs it spin up a container and run the dbt commands and then it shuts down the container.
  2. If you run dbt ls in the dbt project folder in your Airflow scheduler host, how long does it take? Ans : It is taking around 20 sec
  3. How many Airflow DAGs do you have referencing this dbt project?
    Ans : Around 25 dags are DBT dags, but i am just testing 3 dags using Cosmos.
  4. What are the values of the following configuration:-
  • [core] dagbag_import_timeout
    • [core] dag_file_processor_timeout
    • [scheduler] min_file_process_interval

scheduler:
min_file_process_interval: 3000
core:
dagbag_import_timeout: 3600
dag_file_processor_timeout: 3450
lazy_load_plugins: False

Thanks
Giridhar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:performance Related to performance, like memory usage, CPU usage, speed, etc bug Something isn't working triage-needed Items need to be reviewed / assigned to milestone
Projects
None yet
Development

No branches or pull requests

2 participants