Skip to content

SparkKubernetesOperator reattach_on_restart logic doesn't working #41211

@andallo

Description

@andallo

Apache Airflow Provider(s)

cncf-kubernetes

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes 8.3.1

Apache Airflow version

2.9.2

Operating System

Debian GNU/Linux 12 (bookworm)

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

When reattach_on_restart option is on, SparkKubernetesOperator tries to find already launched driver pod by labels: dag_id, task_id, run_id. But operator doesn't add these labels to the driver, so there is no guarantee it can find that driver in case it exists. Operator can find already launched driver only if mentioned labels were specified for driver in parameters of itself.

What you think should happen instead

SparkKubernetesOperator should add labels dag_id, task_id, run_id to specification of SparkApplication for driver and executor. Specification come from application_file or template_spec parameters, and then it become template_body parameter. It is easy to add labels to template_body parameter, because operator has a context that keep all values for mentioned labels.

How to reproduce

Start SparkApplication using SparkKubernetesOperator. Don't specify dag_id, task_id, run_id labels in parameters for driver and executor (for example in application_file parameter). Then task pod submitting SparkApplication will have mentioned labels, but driver and executors pods will not.

That is a problem for reattach_on_restart logic, because it is searching driver by that labels.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions