Skip to content

Conversation

@amoghrajesh
Copy link
Contributor

closes: #49689

Problem

start_time property on a dag file processor subprocess https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/dag_processing/processor.py#L315-L317 is calculated using boot_time in psutil: https://github.com/giampaolo/psutil/blob/d461f4c0f0aad1a039c7d8bb724a4c7288ef2f39/psutil/_pslinux.py#L1557

The problem here seems in our usage of it, when we use it as a property, looks like due to caching, in https://github.com/giampaolo/psutil/blob/d461f4c0f0aad1a039c7d8bb724a4c7288ef2f39/psutil/__init__.py#L774-L784 that the start_time of a subprocess is not updating when the system sleeps, leading to a earlier start_time. And while calculating duration we do a time.time() comparison, which obviously shifts leading to subprocess getting killed.

Shifting to use of time.monotonic gives a more accurate uptime calculation by not letting restarts or system clock dependency.

The fix seems to fix it.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triggerer tests are failing:

FAILED airflow-core/tests/unit/jobs/test_triggerer_job.py::test_trigger_create_race_condition_38599 - TypeError: __init__() missing 1 required keyword-only argument: 'start_time'
FAILED airflow-core/tests/unit/jobs/test_triggerer_job.py::test_failed_trigger - TypeError: __init__() missing 1 required keyword-only argument: 'start_time'
XPASS airflow-core/tests/unit/jobs/test_scheduler_job.py::TestSchedulerJob::test_do_not_schedule_removed_task - This test does not verify anything; no time to fix; see notes below
XPASS airflow-core/tests/unit/jobs/test_triggerer_job.py::test_trigger_can_access_variables_connections_and_xcoms - We know that test is flaky and have no time to fix it before 3.0. We should fix it later. TODO: AIP-72
XPASS airflow-core/tests/unit/jobs/test_triggerer_job.py::test_trigger_can_fetch_trigger_dag_run_count_and_state_in_deferrable - We know that test is flaky and have no time to fix it before 3.0. We should fix it later. TODO: AIP-72
XPASS airflow-core/tests/unit/jobs/test_triggerer_job.py::test_trigger_can_fetch_dag_run_count_ti_count_in_deferrable - We know that test is flaky and have no time to fix it before 3.0. We should fix it later. TODO: AIP-72

You probably need:

diff --git a/airflow-core/tests/unit/jobs/test_triggerer_job.py b/airflow-core/tests/unit/jobs/test_triggerer_job.py
index d3c9ae6f27..a717b0bb83 100644
--- a/airflow-core/tests/unit/jobs/test_triggerer_job.py
+++ b/airflow-core/tests/unit/jobs/test_triggerer_job.py
@@ -174,6 +174,7 @@ def supervisor_builder(mocker, session):
             process=process,
             requests_fd=-1,
             capacity=10,
+            start_time=time.monotonic(),
         )
         # Mock the selector
         mock_selector = mocker.Mock(spec=selectors.DefaultSelector)

or make start_time optional and do

self.start_time = start_time or time.monotonic()

@amoghrajesh
Copy link
Contributor Author

Ah i had made it optional but didnt assign a default. Pushing a fix.

@ashb
Copy link
Member

ashb commented Apr 28, 2025

We should add some of the reason in a comment too, else someone might optimise it by switching back to created time

@kaxil kaxil force-pushed the dagprocessor-crashing-bug branch from e36b5e8 to f5d2783 Compare April 28, 2025 19:30
@kaxil kaxil added this to the Airflow 3.0.1 milestone Apr 28, 2025
@amoghrajesh amoghrajesh added the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Apr 29, 2025
@amoghrajesh amoghrajesh merged commit af10644 into apache:main Apr 29, 2025
71 checks passed
@amoghrajesh amoghrajesh deleted the dagprocessor-crashing-bug branch April 29, 2025 05:10
github-actions bot pushed a commit that referenced this pull request Apr 29, 2025
… processes (#49868)

(cherry picked from commit af10644)

Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
@github-actions
Copy link

Backport successfully created: v3-0-test

Status Branch Result
v3-0-test PR Link

github-actions bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Apr 29, 2025
… processes (apache#49868)

(cherry picked from commit af10644)

Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
amoghrajesh added a commit that referenced this pull request Apr 29, 2025
… processes (#49868) (#49925)

(cherry picked from commit af10644)

Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
mvfc pushed a commit to mvfc/airflow that referenced this pull request Apr 29, 2025
…pache#49868)

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
mvfc pushed a commit to mvfc/airflow that referenced this pull request Apr 29, 2025
…pache#49868)

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
jroachgolf84 pushed a commit to jroachgolf84/airflow that referenced this pull request Apr 30, 2025
…pache#49868)

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:DAG-processing area:task-sdk backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dag processor gets SIGKILL signal and all DAGs are removed from UI

3 participants