[KubernetesPodOperator] Reads Kubernetes events and writes them into log #50192

AutomationDev85 · 2025-05-05T08:28:28Z

Overview

The idea is to make the events of the Kubernetes pod visible in the log during start phase of the Pod. The KubenetesPodOperator starts the Pod and pulls the events in parallel and writes new events into log.
This enables the user to see what is happening in the background in Kubernetes.

Details of change:

await_pod_start pulls cyclic for new Kubernetes events and writes into log parallel to start phase of the Pod.

jason810496

Thanks for the PR.
~~IMO, I will prefer left await_pod_start as original but still be able to support this feature and minimize the change.~~

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/utils/pod_manager.py

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py

providers/cncf/kubernetes/tests/unit/cncf/kubernetes/utils/test_pod_manager.py

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py

jscheffl · 2025-06-03T11:41:40Z

Setting label "log exception" as no new caplog is added. Just existing are still present

jscheffl

Looking good for me. Let's have CI green and then I assume we can merge.
I would wait another 48 hours if somebody else wants to finally review (or object)

…g during start

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>

jscheffl · 2025-06-10T12:47:18Z

Rebasing... if green I'd merge now

ashb · 2025-07-16T15:40:48Z

@jscheffl @AutomationDev85 Uh-oh. This is making use of asyncio in a normal sync worker, and it's causing KPO to break.

I'm double checking versions, this was reported by a user:

File "/usr/local/lib/python3.11/site-packages/airflow/sdk/execution_time/task_runner.py", line 867 in run
File "/usr/local/lib/python3.11/site-packages/airflow/sdk/execution_time/task_runner.py", line 1159 in _execute_task
File "/usr/local/lib/python3.11/site-packages/airflow/sdk/bases/operator.py", line 397 in wrapper
File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 640 in execute
File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 721 in execute_sync
File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 1055 in cleanup
RuntimeError: There is no current event loop in thread 'MainThread'.
File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 680 in execute_sync
File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 607 in await_pod_start
File "/usr/local/lib/python3.11/asyncio/events.py", line 681 in get_event_loop

ashb · 2025-07-16T15:41:34Z

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py

-                schedule_timeout=self.schedule_timeout_seconds,
-                startup_timeout=self.startup_timeout_seconds,
-                check_interval=self.startup_check_interval_seconds,
+            loop = asyncio.get_event_loop()


Specifically this: This is running in a normal sync worker there is no running even loop and this raises an exception.

Hmmm, per the docs:

If there is no running event loop set, the function will return the result of the get_event_loop_policy().get_event_loop() call.

Oh, I wonder if it's related to this, and some user code turning that into an Error.

Deprecated since version 3.12: Deprecation warning is emitted if there is no current event loop. In some future Python release this will become an error.

Trying to confirm.

I am not sure what in parallel execution in a sync task is then the right way. For us it is working like this since a year in production. But maybe our environment is not representative?

Challenge is to parse and follow logs and events in parallel. There is no API in K8s delivering both concurrently and flipping back-and forth is very in efficient if you want to listen to log stream. Therefore we took the async approach.

Do you have more details on where and how it is "breaking"? Which environment?

Note that we also initially attempted to run another thread and not using asyncio but this also was blocked by Celery and is probably also not advised.

I don't have many details -- its second hand through one of the Astronomer customers.

I think this is something turning the deprecation warning in to an error -- I think the fix/workaround is to swap the manual loop control to asyncio.run as per this in the docs

Application developers should typically use the high-level asyncio functions, such as asyncio.run(), and should rarely need to reference the loop object or call its methods. This section is intended mostly for authors of lower-level code, libraries, and frameworks, who need finer control over the event loop behavior.

Can we have a full report including a way to reproduce as a GH issue ticket on this? Would be great as well also to include a test to prevent regression then.

Trying to see what I can get. All I have right now is the stacktrace I put in a comment above.

ashb · 2025-07-16T15:52:40Z

kubernetes-tests/tests/kubernetes_tests/test_kubernetes_pod_operator.py

        )
        assert not k.do_xcom_push

+    @pytest.mark.asyncio


I don't think you should need this -- It's likely needed to make asyncio.get_event_loop() pass, but a worker doesn't have it, and if we swap it to aysncio.run() it won't need this marker. I.e. adding this mark was working around the test failing in a way that is not respective of how the KPO actually runs?

AutomationDev85 requested review from hussein-awala and jedcunningham as code owners May 5, 2025 08:28

boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels May 5, 2025

nevcohen mentioned this pull request May 6, 2025

[KubernetesPodOperator] Dectection of different timeouts for schedule and startup state #49784

Merged

AutomationDev85 force-pushed the feature/enable-events-read-kubernetes-pod-operator branch from 69410d7 to 1f155d4 Compare May 12, 2025 10:54

AutomationDev85 requested review from ashb, gopidesupavan, jason810496 and potiuk as code owners May 12, 2025 10:54

AutomationDev85 force-pushed the feature/enable-events-read-kubernetes-pod-operator branch from 1f155d4 to bafbd0a Compare May 12, 2025 12:59

AutomationDev85 force-pushed the feature/enable-events-read-kubernetes-pod-operator branch 4 times, most recently from 780d7e3 to 12483fe Compare May 28, 2025 12:20

jason810496 reviewed May 28, 2025

View reviewed changes

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/utils/pod_manager.py Show resolved Hide resolved

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py Show resolved Hide resolved

jscheffl reviewed May 29, 2025

View reviewed changes

providers/cncf/kubernetes/tests/unit/cncf/kubernetes/utils/test_pod_manager.py Outdated Show resolved Hide resolved

AutomationDev85 force-pushed the feature/enable-events-read-kubernetes-pod-operator branch 2 times, most recently from f6e1260 to e906bb2 Compare June 2, 2025 07:27

jscheffl reviewed Jun 2, 2025

View reviewed changes

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py Outdated Show resolved Hide resolved

jscheffl added the log exception Set the label if you want to accept change with caplog label Jun 3, 2025

jscheffl force-pushed the feature/enable-events-read-kubernetes-pod-operator branch from 432511d to 68b5e4b Compare June 3, 2025 11:42

jscheffl approved these changes Jun 3, 2025

View reviewed changes

AutomationDev85 and others added 4 commits June 10, 2025 14:46

KubernetesPodOperator reads Kubernetes events and writes them into lo…

e9bada8

…g during start

Fixed pytests for async handling of await_pod_start

afc0c98

fixed raise indentation

2cd3793

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>

Fixed unit test

bc3c171

jscheffl force-pushed the feature/enable-events-read-kubernetes-pod-operator branch from 1b81091 to bc3c171 Compare June 10, 2025 12:46

jscheffl merged commit 46b8aeb into apache:main Jun 10, 2025
77 checks passed

eladkal mentioned this pull request Jun 15, 2025

Status of testing Providers that were prepared on June 15, 2025 #51750

Closed

ashb reviewed Jul 16, 2025

View reviewed changes

[KubernetesPodOperator] Reads Kubernetes events and writes them into log #50192

[KubernetesPodOperator] Reads Kubernetes events and writes them into log #50192

Uh oh!

Conversation

AutomationDev85 commented May 5, 2025

Overview

Details of change:

Uh oh!

jason810496 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jscheffl commented Jun 3, 2025

Uh oh!

jscheffl left a comment

Choose a reason for hiding this comment

Uh oh!

jscheffl commented Jun 10, 2025

Uh oh!

Uh oh!

ashb commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashb Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

ashb Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

ashb Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jscheffl Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

ashb Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

jscheffl Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

ashb Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

ashb Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jason810496 left a comment •

edited

Loading

ashb commented Jul 16, 2025 •

edited

Loading

ashb Jul 16, 2025 •

edited

Loading