Terminate pool when timeout is reached for parallel tests. #53860

potiuk · 2025-07-29T07:41:49Z

When we reach timeut we kill all the hanging containers already and after the pool has been terminated, we will print all the logs.

However, when the pool had not yet been fully executing (i.e the containers were hanging and some tasks were not started) - without terminating the pool that would kill running containers and the remaining tasks would start new ones.

This PR changes the timeout handler to terminate the pool before attempting to kill all the containers.

This PR changes the timeout handler to terminate the pool before
attempting to kill all the containers.

It also turned out that exit handling by the main thread monitorint
the tests in this case would hang rather than print logs:

it was waiting in a loop to wait for all task to complete (which
would never happen)
it was trying to retrieve result from ApplyResult without timeout
where it would hang for ever for terminated tasks

This PR introduces a separate path to handle timeout, which does
not wait for those two and handles timeout immediately. It also
refactors the whole "end of tests" method splitting it into several
methods to make it easier to reason and read.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

potiuk · 2025-07-29T07:46:33Z

The origin of this PR - when trying to diagnode Sqlallchemy 2 CI #52233 it turned out that when things timed-out for all tests, the log output has not been printed .

gopidesupavan · 2025-07-29T08:14:52Z

Nice Thanks for the update :) LGTM

dev/breeze/src/airflow_breeze/utils/parallel.py

jscheffl

Cool!

bugraoz93 · 2025-07-30T15:20:27Z

Great!

When we reach timeut we kill all the hanging containers already and after the pool has been terminated, we will print all the logs. However, when the pool had not yet been fully executing (i.e the containers were hanging and some tasks were not started) - without terminating the pool that would kill running containers and the remaining tasks would start new ones. This PR changes the timeout handler to terminate the pool before attempting to kill all the containers. It also turned out that exit handling by the main thread monitorint the tests in this case would hang rather than print logs: * it was waiting in a loop to wait for all task to complete (which would never happen) * it was trying to retrieve result from ApplyResult without timeout where it would hang for ever for terminated tasks This PR introduces a separate path to handle timeout, which does not wait for those two and handles timeout immediately. It also refactors the whole "end of tests" method splitting it into several methods to make it easier to reason and read.

github-actions · 2025-08-01T12:06:58Z

Backport failed to create: v3-0-test. View the failure log Run details

Status	Branch	Result
❌	v3-0-test

You can attempt to backport this manually by running:

cherry_picker e8d424e v3-0-test

This should apply the commit to the v3-0-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

) When we reach timeut we kill all the hanging containers already and after the pool has been terminated, we will print all the logs. However, when the pool had not yet been fully executing (i.e the containers were hanging and some tasks were not started) - without terminating the pool that would kill running containers and the remaining tasks would start new ones. This PR changes the timeout handler to terminate the pool before attempting to kill all the containers. It also turned out that exit handling by the main thread monitorint the tests in this case would hang rather than print logs: * it was waiting in a loop to wait for all task to complete (which would never happen) * it was trying to retrieve result from ApplyResult without timeout where it would hang for ever for terminated tasks This PR introduces a separate path to handle timeout, which does not wait for those two and handles timeout immediately. It also refactors the whole "end of tests" method splitting it into several methods to make it easier to reason and read.

potiuk requested review from amoghrajesh, ashb, gopidesupavan and jedcunningham as code owners July 29, 2025 07:41

boring-cyborg bot added area:dev-tools backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch labels Jul 29, 2025

potiuk requested review from aritra24, bugraoz93 and jscheffl July 29, 2025 07:47

potiuk force-pushed the better-timeout-handling branch from 85291ef to ff08b2b Compare July 29, 2025 07:52

gopidesupavan approved these changes Jul 29, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

aritra24 approved these changes Jul 29, 2025

View reviewed changes

dev/breeze/src/airflow_breeze/utils/parallel.py Outdated Show resolved Hide resolved

potiuk force-pushed the better-timeout-handling branch 2 times, most recently from 0b167d6 to ad1d84a Compare July 29, 2025 11:49

jscheffl approved these changes Jul 29, 2025

View reviewed changes

bugraoz93 approved these changes Jul 30, 2025

View reviewed changes

potiuk force-pushed the better-timeout-handling branch from ad1d84a to 0364bd0 Compare August 1, 2025 11:21

potiuk merged commit e8d424e into apache:main Aug 1, 2025
103 checks passed

potiuk deleted the better-timeout-handling branch August 1, 2025 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminate pool when timeout is reached for parallel tests. #53860

Terminate pool when timeout is reached for parallel tests. #53860

Uh oh!

potiuk commented Jul 29, 2025 •

edited

Loading

Uh oh!

potiuk commented Jul 29, 2025

Uh oh!

gopidesupavan commented Jul 29, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

jscheffl left a comment

Uh oh!

bugraoz93 commented Jul 30, 2025

Uh oh!

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Terminate pool when timeout is reached for parallel tests. #53860

Terminate pool when timeout is reached for parallel tests. #53860

Uh oh!

Conversation

potiuk commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

potiuk commented Jul 29, 2025

Uh oh!

gopidesupavan commented Jul 29, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

jscheffl left a comment

Choose a reason for hiding this comment

Uh oh!

bugraoz93 commented Jul 30, 2025

Uh oh!

Uh oh!

github-actions bot commented Aug 1, 2025

Backport failed to create: v3-0-test. View the failure log Run details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

potiuk commented Jul 29, 2025 •

edited

Loading