Skip to content

Conversation

@stephen-bracken
Copy link
Contributor

@stephen-bracken stephen-bracken commented Jul 15, 2025

Fix KubernetesJobOperator.get_or_create_pod() sometimes creating duplicate pods.

(re-raised from #52885)

during execute() the KubernetesJobOperator attempts to find the pod associated with the Job object using self.get_or_create_pod(). If Kubernetes is being slow then the Job object will not create a pod before this method gets called, which will result in the underlying find_pod() method returning None, and a duplicate headless pod being created for this task.

This PR removes references to the get_or_create_pod() method in favour of KubernetesJobOperator.get_pod() to prevent creating headless pods.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Jul 15, 2025
@stephen-bracken stephen-bracken force-pushed the stephen-bracken/cncf-kubernetes-job-fix branch 2 times, most recently from 0e04769 to a9ad704 Compare July 15, 2025 15:00
@stephen-bracken stephen-bracken marked this pull request as ready for review July 15, 2025 15:00
@stephen-bracken stephen-bracken force-pushed the stephen-bracken/cncf-kubernetes-job-fix branch 4 times, most recently from d6d3fa4 to 8b43bca Compare July 17, 2025 14:15
Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM - got some minor comments.
Also, I'll be happy for an additional review :)

@stephen-bracken stephen-bracken force-pushed the stephen-bracken/cncf-kubernetes-job-fix branch 13 times, most recently from 30445de to f123b1a Compare July 23, 2025 20:06
@stephen-bracken
Copy link
Contributor Author

In reference to #49899 - I think a change is still necessary to avoid creating multiple pods when parallelism is not set. Do we want to change the control flow to use get_pods() in all circumstances?

@shahar1
Copy link
Contributor

shahar1 commented Jul 26, 2025

In reference to #49899 - I think a change is still necessary to avoid creating multiple pods when parallelism is not set. Do we want to change the control flow to use get_pods() in all circumstances?

You could start with handling only the case where parallelism=False, and later we could simplify the logic if it becomes necessary. Please rebase/merge changes from the main branch, adjust the logic and fix tests appropriately.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Sep 10, 2025
@github-actions github-actions bot closed this Sep 15, 2025
@stephen-bracken stephen-bracken force-pushed the stephen-bracken/cncf-kubernetes-job-fix branch from d561015 to b0f5289 Compare January 2, 2026 15:58
@shahar1 shahar1 changed the title remove references to KubernetesJobOperator.get_or_create_pod() to fix creating duplicate pods Remove references to KubernetesJobOperator.get_or_create_pod() to fix creating duplicate pods Jan 3, 2026
@shahar1
Copy link
Contributor

shahar1 commented Jan 3, 2026

LGTM, I'm making sure with the other contributors that handling parallelism = 0 case makes sense:
https://apache-airflow.slack.com/archives/C06K9Q5G2UA/p1767429130216899

If no strong objections are given in the next 1-2 days, or there are additional approvals for this PR by then - I'm ok with merging it as-is.

@jscheffl
Copy link
Contributor

jscheffl commented Jan 3, 2026

@stephen-bracken / @shahar1 as concerns raised in https://apache-airflow.slack.com/archives/C06K9Q5G2UA/p1767429130216899 - one thing to consider maybe is - I'd propose to - adding a newsfragment such that it is highlighted in the changelog of the provider.

TLDR: Requesting to add a newsfragment to highlight this interface change.

UPDATE: Args, providers have no newsfragmen, add it to changelogs like in https://github.com/apache/airflow/pull/59143/changes#diff-24cff4e7b7926b95f4efef45da9f9d6f43b237b5143990b1554113251cb2c12eR30 for example that it is included in next providers wave.

@stephen-bracken stephen-bracken force-pushed the stephen-bracken/cncf-kubernetes-job-fix branch 2 times, most recently from 8befa0f to 5a29826 Compare January 5, 2026 10:03
@stephen-bracken
Copy link
Contributor Author

@stephen-bracken / @shahar1 as concerns raised in https://apache-airflow.slack.com/archives/C06K9Q5G2UA/p1767429130216899 - one thing to consider maybe is - I'd propose to - adding a newsfragment such that it is highlighted in the changelog of the provider.

TLDR: Requesting to add a newsfragment to highlight this interface change.

UPDATE: Args, providers have no newsfragmen, add it to changelogs like in https://github.com/apache/airflow/pull/59143/changes#diff-24cff4e7b7926b95f4efef45da9f9d6f43b237b5143990b1554113251cb2c12eR30 for example that it is included in next providers wave.

@jscheffl Thanks, added a changelog note.

@stephen-bracken stephen-bracken force-pushed the stephen-bracken/cncf-kubernetes-job-fix branch from 5a29826 to 50261b1 Compare January 5, 2026 10:36
@stephen-bracken stephen-bracken force-pushed the stephen-bracken/cncf-kubernetes-job-fix branch from 50261b1 to 6b3dc30 Compare January 5, 2026 14:08
@shahar1 shahar1 marked this pull request as draft January 10, 2026 10:32
@shahar1
Copy link
Contributor

shahar1 commented Jan 10, 2026

Drafting the PR following the author's request to make some changes, please do not merge

Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment

@stephen-bracken stephen-bracken force-pushed the stephen-bracken/cncf-kubernetes-job-fix branch 2 times, most recently from faae3ae to d1265cd Compare January 10, 2026 10:40
@shahar1 shahar1 marked this pull request as ready for review January 10, 2026 10:41
@stephen-bracken stephen-bracken force-pushed the stephen-bracken/cncf-kubernetes-job-fix branch from d1265cd to 522f30f Compare January 10, 2026 11:11
@stephen-bracken stephen-bracken force-pushed the stephen-bracken/cncf-kubernetes-job-fix branch from 522f30f to 28333ae Compare January 10, 2026 12:02
@shahar1 shahar1 changed the title Remove references to KubernetesJobOperator.get_or_create_pod() to fix creating duplicate pods Fix duplicate pod creation in KubernetesJobOperator Jan 10, 2026
@shahar1 shahar1 merged commit c2bb38f into apache:main Jan 10, 2026
104 checks passed
@boring-cyborg
Copy link

boring-cyborg bot commented Jan 10, 2026

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@shahar1
Copy link
Contributor

shahar1 commented Jan 10, 2026

Great job @stephen-bracken !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants