Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler Crashes When Passing Invalid Value to Argument in default_args #40187

Closed
1 of 2 tasks
hkc-8010 opened this issue Jun 12, 2024 · 1 comment · Fixed by #40269
Closed
1 of 2 tasks

Scheduler Crashes When Passing Invalid Value to Argument in default_args #40187

hkc-8010 opened this issue Jun 12, 2024 · 1 comment · Fixed by #40269
Assignees
Labels
area:core area:Scheduler including HA (high availability) scheduler good first issue kind:bug This is a clearly a bug

Comments

@hkc-8010
Copy link

Apache Airflow version

2.9.2

If "Other Airflow 2 version" selected, which one?

No response

What happened?

When an invalid value is passed to any argument in default_args, such as passing a string value "2" to max_active_tis_per_dag which expects an integer, the scheduler crashes. The error message received is:

[2024-06-12T05:04:39.041+0000] {scheduler_command.py:54} ERROR - Exception when running scheduler job
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/airflow/cli/commands/scheduler_command.py", line 52, in _run_scheduler_job
    run_job(job=job_runner.job, execute_callable=job_runner._execute)
  File "/usr/local/lib/python3.11/site-packages/airflow/utils/session.py", line 79, in wrapper
    return func(*args, session=session, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/jobs/job.py", line 395, in run_job
    return execute_job(job, execute_callable=execute_callable)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/jobs/job.py", line 424, in execute_job
    ret = execute_callable()
          ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/astronomer/airflow/version_check/plugin.py", line 30, in run_before
    fn(*args, **kwargs)
  File "/usr/local/lib/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 854, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 986, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 1099, in _do_scheduling
    num_queued_tis = self._critical_section_enqueue_task_instances(session=session)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 677, in _critical_section_enqueue_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 533, in _executable_task_instances_to_queued
    if current_task_concurrency >= task_concurrency_limit:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>=' not supported between instances of 'int' and 'str'

What you think should happen instead?

Instead of crashing the scheduler, Airflow should throw an import error on the Airflow UI for the problematic DAG. This would help in identifying the problematic DAG easily. The current behaviour makes it difficult to identify which DAG caused the issue, especially when managing hundreds of DAGs. This small mistake in DAG can also cause an outage in the production environment.

How to reproduce

  • Define a DAG with default_args that includes an invalid value for max_active_tis_per_dag, such as:
default_args = {
    'max_active_tis_per_dag': "2",  # invalid value, should be an integer
}
  • Deploy and start the DAG.
  • Observe the scheduler crash with the TypeError.

Operating System

Linux ad25902d8cef 6.7.12-orbstack-00201-g2ddb8f197a46 #1 SMP Tue May 21 04:38:26 UTC 2024 aarch64 GNU/Linux

Versions of Apache Airflow Providers

No response

Deployment

Astronomer

Deployment details

No response

Anything else?

This problem occurs every time an invalid value is passed to an argument in default_args.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@hkc-8010 hkc-8010 added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jun 12, 2024
@vatsrahul1001 vatsrahul1001 added the area:Scheduler including HA (high availability) scheduler label Jun 12, 2024
@vatsrahul1001
Copy link
Collaborator

vatsrahul1001 commented Jun 12, 2024

It's not a new issue in 2.9.2, confirmed with @hkc-8010 he is able to repro in the previous version as well. We should handle invalid values gracefully instead of getting the scheduler crashed

@vatsrahul1001 vatsrahul1001 removed the needs-triage label for new issues that we didn't triage yet label Jun 12, 2024
@pankajkoti pankajkoti self-assigned this Jun 12, 2024
pankajkoti added a commit that referenced this issue Jun 21, 2024
…40269)

Validates that some commonly used arguments by DAG authors
conform to the expected types. If the provided values do not
match the expected types, a TypeError is raised, resulting in
DAG import errors that appear in the Airflow UI.

Closes: #40187
romsharon98 pushed a commit to romsharon98/airflow that referenced this issue Jul 26, 2024
…pache#40269)

Validates that some commonly used arguments by DAG authors
conform to the expected types. If the provided values do not
match the expected types, a TypeError is raised, resulting in
DAG import errors that appear in the Airflow UI.

Closes: apache#40187
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core area:Scheduler including HA (high availability) scheduler good first issue kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants