Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hanging KPO on deferrable task with do_xcom_push #37300

Merged

Conversation

vchiapaikeo
Copy link
Contributor

@vchiapaikeo vchiapaikeo commented Feb 10, 2024

closes: #37298

This ensures that deferrable KPO tasks always reach a terminal state when both deferrable and do_xcom_push are True.

Sample DAG:

from airflow import DAG

from airflow.providers.google.cloud.operators.kubernetes_engine import (
    GKEStartPodOperator,
)


DEFAULT_TASK_ARGS = {
    "owner": "gcp-data-platform",
    "start_date": "2021-04-20",
    "retries": 0,
    "retry_delay": 60,
}

with DAG(
    dag_id="test_gke_op",
    schedule_interval="@daily",
    max_active_runs=1,
    max_active_tasks=5,
    catchup=False,
    default_args=DEFAULT_TASK_ARGS,
) as dag:

    _ = GKEStartPodOperator(
        task_id="whoami",
        name="whoami",
        cmds=["gcloud"],
        arguments=["auth", "list"],
        image="gcr.io/google.com/cloudsdktool/cloud-sdk:slim",
        project_id="redacted-project-id",
        namespace="airflow-default",
        location="us-central1",
        cluster_name="airflow-gke-cluster",
        service_account_name="default",
        deferrable=True,
        do_xcom_push=True,
    )

    _ = GKEStartPodOperator(
        task_id="fail",
        name="fail",
        cmds=["bash"],
        arguments=["-xc", "sleep 2 && exit 1"],
        image="gcr.io/google.com/cloudsdktool/cloud-sdk:slim",
        project_id="redacted-project-id",
        namespace="airflow-default",
        location="us-central1",
        cluster_name="airflow-gke-cluster",
        service_account_name="default",
        deferrable=True,
        do_xcom_push=True,
    )

Task does not hang in running state and completes with expected failure -

image

NOTE: if using ADC credentials, this PR needs to be reverted which breaks ADC auth flow: #37081


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@vchiapaikeo vchiapaikeo force-pushed the vchiapaikeo/fix-failed-kpo-deferred-v1 branch from 4b99603 to d375618 Compare February 10, 2024 02:32
Copy link
Collaborator

@dirrao dirrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Catch. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On failed KPO with deferrable=True, do_xcom_push=True - task never completes due to hanging xcom container
5 participants