Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Jul 29, 2025

When we reach timeut we kill all the hanging containers already and after the pool has been terminated, we will print all the logs.

However, when the pool had not yet been fully executing (i.e the containers were hanging and some tasks were not started) - without terminating the pool that would kill running containers and the remaining tasks would start new ones.

This PR changes the timeout handler to terminate the pool before attempting to kill all the containers.

This PR changes the timeout handler to terminate the pool before
attempting to kill all the containers.

It also turned out that exit handling by the main thread monitorint
the tests in this case would hang rather than print logs:

  • it was waiting in a loop to wait for all task to complete (which
    would never happen)

  • it was trying to retrieve result from ApplyResult without timeout
    where it would hang for ever for terminated tasks

This PR introduces a separate path to handle timeout, which does
not wait for those two and handles timeout immediately. It also
refactors the whole "end of tests" method splitting it into several
methods to make it easier to reason and read.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:dev-tools backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch labels Jul 29, 2025
@potiuk
Copy link
Member Author

potiuk commented Jul 29, 2025

The origin of this PR - when trying to diagnode Sqlallchemy 2 CI #52233 it turned out that when things timed-out for all tests, the log output has not been printed .

@potiuk potiuk force-pushed the better-timeout-handling branch from 85291ef to ff08b2b Compare July 29, 2025 07:52
@gopidesupavan
Copy link
Member

Nice Thanks for the update :) LGTM

aritra24

This comment was marked as resolved.

@potiuk potiuk force-pushed the better-timeout-handling branch 2 times, most recently from 0b167d6 to ad1d84a Compare July 29, 2025 11:49
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@bugraoz93
Copy link
Contributor

Great!

When we reach timeut we kill all the hanging containers already
and after the pool has been terminated, we will print all the logs.

However, when the pool had not yet been fully executing (i.e the
containers were hanging and some tasks were not started) - without
terminating the pool that would kill running containers and the
remaining tasks would start new ones.

This PR changes the timeout handler to terminate the pool before
attempting to kill all the containers.

It also turned out that exit handling by the main thread monitorint
the tests in this case would hang rather than print logs:

* it was waiting in a loop to wait for all task to complete (which
  would never happen)

* it was trying to retrieve result from ApplyResult without timeout
  where it would hang for ever for terminated tasks

This PR introduces a separate path to handle timeout, which does
not wait for those two and handles timeout immediately. It also
refactors the whole "end of tests" method splitting it into several
methods to make it easier to reason and read.
@potiuk potiuk force-pushed the better-timeout-handling branch from ad1d84a to 0364bd0 Compare August 1, 2025 11:21
@potiuk potiuk merged commit e8d424e into apache:main Aug 1, 2025
103 checks passed
@potiuk potiuk deleted the better-timeout-handling branch August 1, 2025 12:06
@github-actions
Copy link

github-actions bot commented Aug 1, 2025

Backport failed to create: v3-0-test. View the failure log Run details

Status Branch Result
v3-0-test Commit Link

You can attempt to backport this manually by running:

cherry_picker e8d424e v3-0-test

This should apply the commit to the v3-0-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

ferruzzi pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Aug 7, 2025
)

When we reach timeut we kill all the hanging containers already
and after the pool has been terminated, we will print all the logs.

However, when the pool had not yet been fully executing (i.e the
containers were hanging and some tasks were not started) - without
terminating the pool that would kill running containers and the
remaining tasks would start new ones.

This PR changes the timeout handler to terminate the pool before
attempting to kill all the containers.

It also turned out that exit handling by the main thread monitorint
the tests in this case would hang rather than print logs:

* it was waiting in a loop to wait for all task to complete (which
  would never happen)

* it was trying to retrieve result from ApplyResult without timeout
  where it would hang for ever for terminated tasks

This PR introduces a separate path to handle timeout, which does
not wait for those two and handles timeout immediately. It also
refactors the whole "end of tests" method splitting it into several
methods to make it easier to reason and read.
fweilun pushed a commit to fweilun/airflow that referenced this pull request Aug 11, 2025
)

When we reach timeut we kill all the hanging containers already
and after the pool has been terminated, we will print all the logs.

However, when the pool had not yet been fully executing (i.e the
containers were hanging and some tasks were not started) - without
terminating the pool that would kill running containers and the
remaining tasks would start new ones.

This PR changes the timeout handler to terminate the pool before
attempting to kill all the containers.

It also turned out that exit handling by the main thread monitorint
the tests in this case would hang rather than print logs:

* it was waiting in a loop to wait for all task to complete (which
  would never happen)

* it was trying to retrieve result from ApplyResult without timeout
  where it would hang for ever for terminated tasks

This PR introduces a separate path to handle timeout, which does
not wait for those two and handles timeout immediately. It also
refactors the whole "end of tests" method splitting it into several
methods to make it easier to reason and read.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants