Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't crash scheduler if exec config has old k8s objects #24117

Merged
merged 1 commit into from
Jun 15, 2022

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented Jun 2, 2022

From time to time k8s library objects change their attrs. If executor config is stored with old version, and unpickled with new version, we can get attribute errors that can crash the scheduler (see #23727).

Here we update handling so that we fail the task but don't crash the scheduler.

resolves #23727

@boring-cyborg boring-cyborg bot added provider:cncf-kubernetes Kubernetes provider related issues area:Scheduler including HA (high availability) scheduler labels Jun 2, 2022
@dstandish dstandish marked this pull request as ready for review June 3, 2022 18:36
@dstandish dstandish force-pushed the fix-incompat-executor-config branch from 67a0860 to ece6e96 Compare June 3, 2022 18:37
@dstandish dstandish changed the title Handle incompatible pickled executor config after k8s upgrade Don't crash scheduler of exec config has old k8s objects Jun 3, 2022
@dstandish dstandish changed the title Don't crash scheduler of exec config has old k8s objects Don't crash scheduler uf exec config has old k8s objects Jun 3, 2022
@dstandish dstandish changed the title Don't crash scheduler uf exec config has old k8s objects Don't crash scheduler if exec config has old k8s objects Jun 3, 2022
From time to time k8s library objects change their attrs.  If executor config is stored with old version, and unpickled with new version, we can get attribute errors that can crash the scheduler (see apache#23727).

Here we update handling so that we fail the task but don't crash the scheduler.
@dstandish dstandish force-pushed the fix-incompat-executor-config branch from ece6e96 to 8594b96 Compare June 3, 2022 19:06
@ashb ashb added this to the Airflow 2.3.3 milestone Jun 13, 2022
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jun 14, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@dstandish dstandish merged commit 0c41f43 into apache:main Jun 15, 2022
@dstandish dstandish deleted the fix-incompat-executor-config branch June 15, 2022 04:30
potiuk added a commit to potiuk/airflow that referenced this pull request Jun 15, 2022
ephraimbuddy pushed a commit to astronomer/airflow that referenced this pull request Jun 16, 2022
From time to time k8s library objects change their attrs.  If executor config is stored with old version, and unpickled with new version, we can get attribute errors that can crash the scheduler (see apache#23727).

Here we update handling so that we fail the task but don't crash the scheduler.

(cherry picked from commit 0c41f43)
ephraimbuddy pushed a commit that referenced this pull request Jun 30, 2022
From time to time k8s library objects change their attrs.  If executor config is stored with old version, and unpickled with new version, we can get attribute errors that can crash the scheduler (see #23727).

Here we update handling so that we fail the task but don't crash the scheduler.

(cherry picked from commit 0c41f43)
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge provider:cncf-kubernetes Kubernetes provider related issues type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Airflow 2.3 scheduler error: 'V1Container' object has no attribute '_startup_probe'
6 participants