-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KubernetesPodOperator new callbacks and allow multiple callbacks #44357
base: main
Are you sure you want to change the base?
Changes from all commits
9828750
f9b5434
9ac5625
28a5e8a
08a4a4b
9847da9
b888ba2
689d0ae
bd5cfeb
078bc94
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -300,7 +300,7 @@ class PodManager(LoggingMixin): | |
def __init__( | ||
self, | ||
kube_client: client.CoreV1Api, | ||
callbacks: type[KubernetesPodOperatorCallback] | None = None, | ||
callbacks: list[type[KubernetesPodOperatorCallback]] | None = None, | ||
): | ||
""" | ||
Create the launcher. | ||
|
@@ -311,7 +311,7 @@ def __init__( | |
super().__init__() | ||
self._client = kube_client | ||
self._watch = watch.Watch() | ||
self._callbacks = callbacks | ||
self._callbacks = callbacks or [] | ||
|
||
def run_pod_async(self, pod: V1Pod, **kwargs) -> V1Pod: | ||
"""Run POD asynchronously.""" | ||
|
@@ -446,8 +446,8 @@ def consume_logs(*, since_time: DateTime | None = None) -> tuple[DateTime | None | |
progress_callback_lines.append(line) | ||
else: # previous log line is complete | ||
for line in progress_callback_lines: | ||
if self._callbacks: | ||
self._callbacks.progress_callback( | ||
for callback in self._callbacks: | ||
callback.progress_callback( | ||
line=line, client=self._client, mode=ExecutionMode.SYNC | ||
Comment on lines
+449
to
451
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems ok when callbacks are running in SYNC mode. What about async? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Callbacks aren't really implemented for async operation at the moment unfortunately. #35714 (comment). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. Thanks! |
||
) | ||
if message_to_log is not None: | ||
|
@@ -465,8 +465,8 @@ def consume_logs(*, since_time: DateTime | None = None) -> tuple[DateTime | None | |
finally: | ||
# log the last line and update the last_captured_timestamp | ||
for line in progress_callback_lines: | ||
if self._callbacks: | ||
self._callbacks.progress_callback( | ||
for callback in self._callbacks: | ||
callback.progress_callback( | ||
line=line, client=self._client, mode=ExecutionMode.SYNC | ||
) | ||
if message_to_log is not None: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we send the completed
pod
here. That would require some tracking and filtering to send one. Why can this callback's role be achieved byon_pod_completion
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing code makes a call to find the pod if the callbacks are attached.
airflow/providers/src/airflow/providers/cncf/kubernetes/operators/pod.py
Line 614 in c77c7f0
Honestly I'd prefer if I was sending a stale reference and that it was the responsibility of the callback to get an updated pod if it needs it since we're sending the client too. Especially since its possible a callback might not implement the
on_pod_completion
method. So to maintain existing behaviour I'm getting it once. An alternative might be to test if theon_pod_completion
is implemented in the callback and get an updated pod for each callback call, but again this assumes we need an updated pod, which might not be the case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the need for
on_pod_wrapup
my thought here was that you might have multiple callbacks running before a sidecar container is killed. Rather than attaching mutliple sidecars to the container, you could have a class that attaches a single sidecar and kills it in theon_pod_wrapup
, any subclasses of it could pull whatever files they need or run any commands in the pod during theon_pod_completion
callback.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure if I understand you well here. Are you talking about a specific case of a running sidecar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I'm not being clear. The way I've been using this so far is that I have one class that extends off the
KubernetesPodOperatorCallback
and does the insertion of a sidecaron_pod_manifest_created
, killing the sidecaron_pod_wrapup
, and some code to ensure that the sidecar is only added/killed once. Then extending off that I have a bunch of other classes that are responsible for doing some actual work with the sidecar, in my case I pull DBT artifacts in theon_pod_completion
method, then inon_pod_wrapup
they callsuper().on_pod_wrapup()
before extracting dataset events from DBT artifacts and a seperate callback that uploads them to S3.