-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Terminate pool when timeout is reached for parallel tests. #53860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The origin of this PR - when trying to diagnode Sqlallchemy 2 CI #52233 it turned out that when things timed-out for all tests, the log output has not been printed . |
85291ef to
ff08b2b
Compare
|
Nice Thanks for the update :) LGTM |
0b167d6 to
ad1d84a
Compare
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
|
Great! |
When we reach timeut we kill all the hanging containers already and after the pool has been terminated, we will print all the logs. However, when the pool had not yet been fully executing (i.e the containers were hanging and some tasks were not started) - without terminating the pool that would kill running containers and the remaining tasks would start new ones. This PR changes the timeout handler to terminate the pool before attempting to kill all the containers. It also turned out that exit handling by the main thread monitorint the tests in this case would hang rather than print logs: * it was waiting in a loop to wait for all task to complete (which would never happen) * it was trying to retrieve result from ApplyResult without timeout where it would hang for ever for terminated tasks This PR introduces a separate path to handle timeout, which does not wait for those two and handles timeout immediately. It also refactors the whole "end of tests" method splitting it into several methods to make it easier to reason and read.
ad1d84a to
0364bd0
Compare
Backport failed to create: v3-0-test. View the failure log Run details
You can attempt to backport this manually by running: cherry_picker e8d424e v3-0-testThis should apply the commit to the v3-0-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continue |
) When we reach timeut we kill all the hanging containers already and after the pool has been terminated, we will print all the logs. However, when the pool had not yet been fully executing (i.e the containers were hanging and some tasks were not started) - without terminating the pool that would kill running containers and the remaining tasks would start new ones. This PR changes the timeout handler to terminate the pool before attempting to kill all the containers. It also turned out that exit handling by the main thread monitorint the tests in this case would hang rather than print logs: * it was waiting in a loop to wait for all task to complete (which would never happen) * it was trying to retrieve result from ApplyResult without timeout where it would hang for ever for terminated tasks This PR introduces a separate path to handle timeout, which does not wait for those two and handles timeout immediately. It also refactors the whole "end of tests" method splitting it into several methods to make it easier to reason and read.
) When we reach timeut we kill all the hanging containers already and after the pool has been terminated, we will print all the logs. However, when the pool had not yet been fully executing (i.e the containers were hanging and some tasks were not started) - without terminating the pool that would kill running containers and the remaining tasks would start new ones. This PR changes the timeout handler to terminate the pool before attempting to kill all the containers. It also turned out that exit handling by the main thread monitorint the tests in this case would hang rather than print logs: * it was waiting in a loop to wait for all task to complete (which would never happen) * it was trying to retrieve result from ApplyResult without timeout where it would hang for ever for terminated tasks This PR introduces a separate path to handle timeout, which does not wait for those two and handles timeout immediately. It also refactors the whole "end of tests" method splitting it into several methods to make it easier to reason and read.
When we reach timeut we kill all the hanging containers already and after the pool has been terminated, we will print all the logs.
However, when the pool had not yet been fully executing (i.e the containers were hanging and some tasks were not started) - without terminating the pool that would kill running containers and the remaining tasks would start new ones.
This PR changes the timeout handler to terminate the pool before attempting to kill all the containers.
This PR changes the timeout handler to terminate the pool before
attempting to kill all the containers.
It also turned out that exit handling by the main thread monitorint
the tests in this case would hang rather than print logs:
it was waiting in a loop to wait for all task to complete (which
would never happen)
it was trying to retrieve result from ApplyResult without timeout
where it would hang for ever for terminated tasks
This PR introduces a separate path to handle timeout, which does
not wait for those two and handles timeout immediately. It also
refactors the whole "end of tests" method splitting it into several
methods to make it easier to reason and read.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.