KubernetesPodOperator new callbacks and allow multiple callbacks #44357

johnhoran · 2024-11-25T15:37:19Z

I would like to have multiple callbacks in the kubernetes pod operator, and add two new callbacks. on_manifest_finalization would allow the callback to make changes just before the manifest is turned into a pod.
on_pod_wrapup would happen after the calls to on_pod_completion but just before the pod is deleted.

Adding both of these plus allowing multiple callbacks would allow you to do things in the kubernetes pod operator akin to how it approaches XComs but in a modular way. My use case here is I'm running DBT in kpo, and I want to do multiple things to the DBT artefacts after the DBT job has run, I could use on_manifest_finalization to insert an alpine sidecar with volumes mounted with the same intention as the XCom sidecar, where the sidecar keeps the volumes alive as files are extracted from it. on_pod_wrapup would allow me to insert a single sidecar and have multiple callbacks run their on_pod_completion before on_pod_wrapup kills the sidecar.

amoghrajesh

Good start!
Can you add few test cases that test more scenarios of more than 1 callback?

Few sync callbacks
Few async
Combination of both - trying to assert for order

amoghrajesh · 2024-12-15T12:58:37Z

providers/src/airflow/providers/cncf/kubernetes/callbacks.py

@@ -50,7 +54,27 @@ def on_sync_client_creation(*, client: k8s.CoreV1Api, **kwargs) -> None:
        pass

    @staticmethod
-    def on_pod_creation(*, pod: k8s.V1Pod, client: client_type, mode: str, **kwargs) -> None:
+    def on_manifest_finalization(


The name here sounds a little difficult to understand. Can we do better here?
I have a proposal, how about on_pod_manifest_created?

Yep, sounds good to me. If you have a suggestion for the on_pod_wrapup too, I struggled with that one.

That kind of seems ok but would do better with a renaming!

amoghrajesh · 2024-12-15T13:00:14Z

providers/src/airflow/providers/cncf/kubernetes/callbacks.py

+    def on_pod_wrapup(
+        *,
+        pod: k8s.V1Pod,
+        client: client_type,
+        mode: str,
+        operator: KubernetesPodOperator,
+        context: Context,
+        **kwargs,
+    ) -> None:
+        """
+        Invoke this callback after all pod completion callbacks but before the pod is deleted.
+
+        :param pod: the completed pod.
+        :param client: the Kubernetes client that can be used in the callback.
+        :param mode: the current execution mode, it's one of (`sync`, `async`).
+        """
+        pass


How can we send the completed pod here. That would require some tracking and filtering to send one. Why can this callback's role be achieved by on_pod_completion?

The existing code makes a call to find the pod if the callbacks are attached.

airflow/providers/src/airflow/providers/cncf/kubernetes/operators/pod.py

Line 614 in c77c7f0

pod=self.find_pod(self.pod.metadata.namespace, context=context),

Honestly I'd prefer if I was sending a stale reference and that it was the responsibility of the callback to get an updated pod if it needs it since we're sending the client too. Especially since its possible a callback might not implement the on_pod_completion method. So to maintain existing behaviour I'm getting it once. An alternative might be to test if the on_pod_completion is implemented in the callback and get an updated pod for each callback call, but again this assumes we need an updated pod, which might not be the case.

As for the need for on_pod_wrapup my thought here was that you might have multiple callbacks running before a sidecar container is killed. Rather than attaching mutliple sidecars to the container, you could have a class that attaches a single sidecar and kills it in the on_pod_wrapup, any subclasses of it could pull whatever files they need or run any commands in the pod during the on_pod_completion callback.

I am not quite sure if I understand you well here. Are you talking about a specific case of a running sidecar?

Sorry if I'm not being clear. The way I've been using this so far is that I have one class that extends off the KubernetesPodOperatorCallback and does the insertion of a sidecar on_pod_manifest_created, killing the sidecar on_pod_wrapup, and some code to ensure that the sidecar is only added/killed once. Then extending off that I have a bunch of other classes that are responsible for doing some actual work with the sidecar, in my case I pull DBT artifacts in the on_pod_completion method, then in on_pod_wrapup they call super().on_pod_wrapup() before extracting dataset events from DBT artifacts and a seperate callback that uploads them to S3.

amoghrajesh · 2024-12-15T13:02:42Z

providers/src/airflow/providers/cncf/kubernetes/utils/pod_manager.py

+                                    for callback in self._callbacks:
+                                        callback.progress_callback(
                                            line=line, client=self._client, mode=ExecutionMode.SYNC


Seems ok when callbacks are running in SYNC mode. What about async?
Would probably require some more thinking

Callbacks aren't really implemented for async operation at the moment unfortunately. #35714 (comment).

I see. Thanks!
Yea in that case, this will do

johnhoran added 2 commits November 25, 2024 14:45

callbacks list

9828750

pass

f9b5434

johnhoran requested review from jedcunningham and hussein-awala as code owners November 25, 2024 15:37

boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes provider related issues labels Nov 25, 2024

johnhoran and others added 5 commits November 25, 2024 15:47

add test

9ac5625

fmt

28a5e8a

pass context

08a4a4b

only static callbacks allowed

9847da9

fix lint error

b888ba2

eladkal requested review from romsharon98 and amoghrajesh December 14, 2024 07:16

amoghrajesh reviewed Dec 15, 2024

View reviewed changes

johnhoran and others added 3 commits December 16, 2024 12:58

Merge branch 'main' into kubernetes-callbacks

689d0ae

rename

bd5cfeb

add test

078bc94

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KubernetesPodOperator new callbacks and allow multiple callbacks #44357

KubernetesPodOperator new callbacks and allow multiple callbacks #44357

johnhoran commented Nov 25, 2024

amoghrajesh left a comment

amoghrajesh Dec 15, 2024

johnhoran Dec 15, 2024

amoghrajesh Dec 16, 2024

amoghrajesh Dec 15, 2024

johnhoran Dec 15, 2024

johnhoran Dec 15, 2024

amoghrajesh Dec 16, 2024

johnhoran Dec 16, 2024 •

edited

Loading

amoghrajesh Dec 15, 2024

johnhoran Dec 15, 2024

amoghrajesh Dec 16, 2024

KubernetesPodOperator new callbacks and allow multiple callbacks #44357

Are you sure you want to change the base?

KubernetesPodOperator new callbacks and allow multiple callbacks #44357

Conversation

johnhoran commented Nov 25, 2024

amoghrajesh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnhoran Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnhoran Dec 16, 2024 •

edited

Loading