-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Enhanced Pod Failure Logging in KubernetesExecutor #54115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced Pod Failure Logging in KubernetesExecutor #54115
Conversation
6db2f66 to
4991b20
Compare
hterik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! 👍
...iders/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py
Outdated
Show resolved
Hide resolved
...iders/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py
Outdated
Show resolved
Hide resolved
...iders/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py
Outdated
Show resolved
Hide resolved
...iders/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py
Outdated
Show resolved
Hide resolved
...iders/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py
Outdated
Show resolved
Hide resolved
...iders/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py
Outdated
Show resolved
Hide resolved
...iders/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py
Outdated
Show resolved
Hide resolved
9d554d6 to
2ffc67e
Compare
hterik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor improvements suggested. Otherwise looks ok to me.
Someone else with more knowledge and authority should take a look as well.
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_types.py
Show resolved
Hide resolved
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_types.py
Outdated
Show resolved
Hide resolved
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py
Outdated
Show resolved
Hide resolved
2d2e741 to
1725912
Compare
|
@hterik Thanks for the review! |
hterik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py
Show resolved
Hide resolved
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py
Outdated
Show resolved
Hide resolved
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_types.py
Show resolved
Hide resolved
1725912 to
6a153c0
Compare
hterik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpicks from me only, otherwise looks good.
Someone else need to approve.
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py
Outdated
Show resolved
Hide resolved
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_types.py
Outdated
Show resolved
Hide resolved
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py
Show resolved
Hide resolved
|
Thanks again for the review ! 🤩 |
jason810496
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for the PR!
Only a question regarding where to call collect_pod_failure_details(pod).
providers/cncf/kubernetes/tests/unit/cncf/kubernetes/executors/test_kubernetes_executor.py
Outdated
Show resolved
Hide resolved
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py
Outdated
Show resolved
Hide resolved
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py
Outdated
Show resolved
Hide resolved
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py
Show resolved
Hide resolved
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py
Outdated
Show resolved
Hide resolved
Add detailed logging of pod and container failure information when tasks fail, including pod phase, reasons, messages, and container states for better debugging. - Extract pod status (phase, reason, message) on task failure - Extract container state info (terminated/waiting reasons & messages) - Add exception handling for Kubernetes API failures - Only execute additional logging for FAILED task states
Move failure analysis to watcher thread, add detailed container status logging, and include task keys for better log searchability.
- Add FailureDetails TypedDict for type-safe failure information - Extract collect_pod_failure_details() function for cleaner separation of concerns - Create _analyze_init_containers() and _analyze_main_containers() helper functions
…modular design 1. Introduce FailureDetails TypedDict 2. Implement collect_pod_failure_details Function 3. Enhanced Failure Logging
… failure analysis - Pass logger as parameter to collect_pod_failure_details for consistent logging context - Move failure details collection logic to only execute for Failed status pods for better performance - Update test imports to handle timezone module location changes between Airflow versions
6da99f2 to
9fad3af
Compare
|
Thanks for the views! 🤩 |
* feat: Enhanced pod failure logging in KubernetesExecutor Add detailed logging of pod and container failure information when tasks fail, including pod phase, reasons, messages, and container states for better debugging. - Extract pod status (phase, reason, message) on task failure - Extract container state info (terminated/waiting reasons & messages) - Add exception handling for Kubernetes API failures - Only execute additional logging for FAILED task states * feat(k8s-executor): improve pod failure diagnostics and reduce API calls Move failure analysis to watcher thread, add detailed container status logging, and include task keys for better log searchability. * refactor: Extract pod failure analysis to dedicated TypedDict functions - Add FailureDetails TypedDict for type-safe failure information - Extract collect_pod_failure_details() function for cleaner separation of concerns - Create _analyze_init_containers() and _analyze_main_containers() helper functions * feat(kubernetes): enhance pod failure diagnostics with TypedDict and modular design 1. Introduce FailureDetails TypedDict 2. Implement collect_pod_failure_details Function 3. Enhanced Failure Logging * fix(kubernetes): improve type safety with Literal types for container analysis * feat: enhance pod failure logging in KubernetesExecutor with detailed failure analysis - Pass logger as parameter to collect_pod_failure_details for consistent logging context - Move failure details collection logic to only execute for Failed status pods for better performance - Update test imports to handle timezone module location changes between Airflow versions
Description
Add detailed logging of pod and container failure information when tasks fail, including pod phase, reasons, messages, and container states for better debugging.
Problem Statement
When tasks fail in KubernetesExecutor, operators often lack sufficient information to quickly diagnose the root cause of failures. The current implementation provides minimal failure context, making troubleshooting time-consuming and inefficient.
Solution
Introduced comprehensive logging for pod and container failure states, capturing detailed error information to streamline debugging.
Changes Made
Core Implementation
_change_statemethod inKubernetesExecutorkube_client.read_namespaced_pod()Closes: #37548
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.