Skip to content

Conversation

@AutomationDev85
Copy link
Contributor

Overview

The idea is to make the events of the Kubernetes pod visible in the log during start phase of the Pod. The KubenetesPodOperator starts the Pod and pulls the events in parallel and writes new events into log.
This enables the user to see what is happening in the background in Kubernetes.

Details of change:

  • await_pod_start pulls cyclic for new Kubernetes events and writes into log parallel to start phase of the Pod.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels May 5, 2025
@AutomationDev85 AutomationDev85 force-pushed the feature/enable-events-read-kubernetes-pod-operator branch from 69410d7 to 1f155d4 Compare May 12, 2025 10:54
@AutomationDev85 AutomationDev85 force-pushed the feature/enable-events-read-kubernetes-pod-operator branch from 1f155d4 to bafbd0a Compare May 12, 2025 12:59
@AutomationDev85 AutomationDev85 force-pushed the feature/enable-events-read-kubernetes-pod-operator branch 4 times, most recently from 780d7e3 to 12483fe Compare May 28, 2025 12:20
Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
IMO, I will prefer left await_pod_start as original but still be able to support this feature and minimize the change.

@AutomationDev85 AutomationDev85 force-pushed the feature/enable-events-read-kubernetes-pod-operator branch 2 times, most recently from f6e1260 to e906bb2 Compare June 2, 2025 07:27
@jscheffl jscheffl added the log exception Set the label if you want to accept change with caplog label Jun 3, 2025
@jscheffl
Copy link
Contributor

jscheffl commented Jun 3, 2025

Setting label "log exception" as no new caplog is added. Just existing are still present

@jscheffl jscheffl force-pushed the feature/enable-events-read-kubernetes-pod-operator branch from 432511d to 68b5e4b Compare June 3, 2025 11:42
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good for me. Let's have CI green and then I assume we can merge.
I would wait another 48 hours if somebody else wants to finally review (or object)

AutomationDev85 and others added 4 commits June 10, 2025 14:46
@jscheffl jscheffl force-pushed the feature/enable-events-read-kubernetes-pod-operator branch from 1b81091 to bc3c171 Compare June 10, 2025 12:46
@jscheffl
Copy link
Contributor

Rebasing... if green I'd merge now

@jscheffl jscheffl merged commit 46b8aeb into apache:main Jun 10, 2025
77 checks passed
@ashb
Copy link
Member

ashb commented Jul 16, 2025

@jscheffl @AutomationDev85 Uh-oh. This is making use of asyncio in a normal sync worker, and it's causing KPO to break.

I'm double checking versions, this was reported by a user:

File "/usr/local/lib/python3.11/site-packages/airflow/sdk/execution_time/task_runner.py", line 867 in run
File "/usr/local/lib/python3.11/site-packages/airflow/sdk/execution_time/task_runner.py", line 1159 in _execute_task
File "/usr/local/lib/python3.11/site-packages/airflow/sdk/bases/operator.py", line 397 in wrapper
File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 640 in execute
File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 721 in execute_sync
File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 1055 in cleanup
RuntimeError: There is no current event loop in thread 'MainThread'.
File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 680 in execute_sync
File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 607 in await_pod_start
File "/usr/local/lib/python3.11/asyncio/events.py", line 681 in get_event_loop

schedule_timeout=self.schedule_timeout_seconds,
startup_timeout=self.startup_timeout_seconds,
check_interval=self.startup_check_interval_seconds,
loop = asyncio.get_event_loop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically this: This is running in a normal sync worker there is no running even loop and this raises an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, per the docs:

If there is no running event loop set, the function will return the result of the get_event_loop_policy().get_event_loop() call.

Copy link
Member

@ashb ashb Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I wonder if it's related to this, and some user code turning that into an Error.

Deprecated since version 3.12: Deprecation warning is emitted if there is no current event loop. In some future Python release this will become an error.

Trying to confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what in parallel execution in a sync task is then the right way. For us it is working like this since a year in production. But maybe our environment is not representative?

Challenge is to parse and follow logs and events in parallel. There is no API in K8s delivering both concurrently and flipping back-and forth is very in efficient if you want to listen to log stream. Therefore we took the async approach.

Do you have more details on where and how it is "breaking"? Which environment?

Note that we also initially attempted to run another thread and not using asyncio but this also was blocked by Celery and is probably also not advised.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have many details -- its second hand through one of the Astronomer customers.

I think this is something turning the deprecation warning in to an error -- I think the fix/workaround is to swap the manual loop control to asyncio.run as per this in the docs

Application developers should typically use the high-level asyncio functions, such as asyncio.run(), and should rarely need to reference the loop object or call its methods. This section is intended mostly for authors of lower-level code, libraries, and frameworks, who need finer control over the event loop behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a full report including a way to reproduce as a GH issue ticket on this? Would be great as well also to include a test to prevent regression then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to see what I can get. All I have right now is the stacktrace I put in a comment above.

)
assert not k.do_xcom_push

@pytest.mark.asyncio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should need this -- It's likely needed to make asyncio.get_event_loop() pass, but a worker doesn't have it, and if we swap it to aysncio.run() it won't need this marker. I.e. adding this mark was working around the test failing in a way that is not respective of how the KPO actually runs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers log exception Set the label if you want to accept change with caplog provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants