Skip to content

Comments

Fix KeyError when k8s omits optional reason field in container waiting status #60527#60805

Merged
jscheffl merged 1 commit intoapache:mainfrom
anishgirianish:fix/60527-optional-keys-handeling
Feb 5, 2026
Merged

Fix KeyError when k8s omits optional reason field in container waiting status #60527#60805
jscheffl merged 1 commit intoapache:mainfrom
anishgirianish:fix/60527-optional-keys-handeling

Conversation

@anishgirianish
Copy link
Contributor


Closes #60527

The k8s API spec defines reason and message as optional fields in ContainerStateWaiting, but the code assumed they always exist. This caused KeyError: 'reason' crashes in the job watcher, leaving execution slots locked until scheduler restart.

Fixed by using .get() for safe access to these optional fields.

Added test case for missing reason scenario.


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@anishgirianish anishgirianish force-pushed the fix/60527-optional-keys-handeling branch from 359ea64 to fb5c8d9 Compare January 20, 2026 04:43
@anishgirianish anishgirianish changed the title #57061 (fix): fix race condition task failure update Fix KeyError when k8s omits optional reason field in container waiting status #60527 Jan 20, 2026
@dthauvin
Copy link

dthauvin commented Feb 5, 2026

hi @anishgirianish any clue when this fix will be attached to a Milestone and released ?

@anishgirianish
Copy link
Contributor Author

Hi @dthauvin, thanks for your interest in this fix! For questions about milestones and release timelines, the maintainers would have a better idea. @jedcunningham @hussein-awala, would you be able to help with this? And whenever you have a moment, I'd really appreciate your feedback on this PR as well. Thank you!

Copy link
Contributor

@Nataneljpwd Nataneljpwd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I would maybe add a link to the request spec where it states that the field is optional as a comment to the code so that no one changes it later on

@eladkal eladkal requested a review from jscheffl February 5, 2026 18:35
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Thanks for the fix!

(And I'd really love if K8s generated API bindings would use fully-typed objects such that these type problems can be directly found by static type checks...)

@jscheffl jscheffl merged commit c8f4466 into apache:main Feb 5, 2026
104 checks passed
jhgoebbert pushed a commit to jhgoebbert/airflow_Owen-CH-Leung that referenced this pull request Feb 8, 2026
Ratasa143 pushed a commit to Ratasa143/airflow that referenced this pull request Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scheduler does not release slots correctly when kubernetes sends no reason for container waiting status

4 participants