Skip to content

Conversation

@Crowiant
Copy link
Contributor

@Crowiant Crowiant commented Dec 5, 2025

There is a chance of a transient error inside the Kubernetes provider when two identical pods are spawned but the pod.status.start_time parameter is not set yet. To prevent this situation, an additional check of creation_timestamp was added. It is a more convenient step to compare identical pods based on the time when etcd receives information about the pod rather than when the scheduler sends information to the kubelet.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Dec 5, 2025
@potiuk potiuk force-pushed the fix-kubernetes-pod-operator-start-time branch from a4f703d to 951ae12 Compare December 7, 2025 21:21
@Crowiant Crowiant force-pushed the fix-kubernetes-pod-operator-start-time branch from 951ae12 to 5994234 Compare December 8, 2025 12:17
@Crowiant
Copy link
Contributor Author

Hello @potiuk @shahar1 could you please review the PR? Thank you!

@potiuk
Copy link
Member

potiuk commented Dec 10, 2025

Not sure why you are calling me - that's not my area of expertise. Generally what we prefer is when you are pinging to attract maintainer to review - ping "in general" "Hey can I get a review" - generally speaking when I answer to your ping now that "this is not my area of expertise" - this will significantly decrease your chances of getting help from some other maintainer, because they will see I was involved (and they will not even see that I responded "not my area of interest".

But .. you made your choice.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a transient error in the Kubernetes provider that occurs when duplicate pods are spawned but their pod.status.start_time parameter is not yet set. The fix adds a fallback mechanism to use creation_timestamp from pod metadata when start_time is unavailable.

Key changes:

  • Modified _get_most_recent_pod_index method to fallback to creation_timestamp when start_time is None
  • Added two new test cases to verify the fallback behavior

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py Added fallback logic to use creation_timestamp when start_time is None in the _get_most_recent_pod_index method
providers/cncf/kubernetes/tests/unit/cncf/kubernetes/operators/test_pod.py Added two test cases: one testing fallback to creation_timestamp when start_time is None, and another testing behavior when both are None

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not my area of expertise either (yet) - I've assigned Copilot for an initial review, so please address its commentary as well.
However, my 2 cents, if I understand these changes correctly - the if block makes the comparison based on a creation_timestamp, which is clearly not the start_time. Therefore, assigning the creation_timestamp to pod_start_times in the if statement is rather misleading.
In that case, why not making the entire comparison based on creation_timestamp, replacing the start_time completely? Alternatively, if comparison based on start_time is useful for most cases, maybe you could introduce a flag for the user so they'll configure the comparison method?

I'll be happy for an additional review from k8s owners.

@Crowiant
Copy link
Contributor Author

Crowiant commented Dec 15, 2025

Hello @potiuk , @shahar1 !

Thank you for the comments.

I tagged you because I saw that Jarek previously participated in Kubernetes provider PRs approvals, and Shahar approved my previous PR (#49899).

While I understand your point, please note that the auto-assigned reviewers haven't reviewed any PRs in the last 6 months. I tagged you because the standard process didn't seem to be working.

Since I mostly contribute to the Google provider, I assumed tagging active maintainers was the right move to unblock the PR. I wasn't trying to be disruptive, just trying to get the code reviewed. So I apologize if I didn't follow the specific protocol for Kubernetes provider. I'll keep this in mind for next time.

@potiuk
Copy link
Member

potiuk commented Dec 16, 2025

While I understand your point, please note that the auto-assigned reviewers haven't reviewed any PRs in the last 6 months. I tagged you because the standard process didn't seem to be working.

I am not sure where you get that statistics from, but maybe try to ping those auto-assigned reviewers. You have not tried it yet and seems that they are best suited. Maybe they don't realise you wait for their review?

@Crowiant
Copy link
Contributor Author

Hello @potiuk ! Yes, my mistake, I checked manually and missed some of the PR´s reviewed by the code owners. I apologize for the false statement. Thank you for pointing it out. I will ping them in a separate comment.

@Crowiant
Copy link
Contributor Author

Hello @hussein-awala @jedcunningham could you please help with the review of the PR? Thank you!

@Crowiant
Copy link
Contributor Author

Crowiant commented Jan 8, 2026

Not my area of expertise either (yet) - I've assigned Copilot for an initial review, so please address its commentary as well. However, my 2 cents, if I understand these changes correctly - the if block makes the comparison based on a creation_timestamp, which is clearly not the start_time. Therefore, assigning the creation_timestamp to pod_start_times in the if statement is rather misleading. In that case, why not making the entire comparison based on creation_timestamp, replacing the start_time completely? Alternatively, if comparison based on start_time is useful for most cases, maybe you could introduce a flag for the user so they'll configure the comparison method?

I'll be happy for an additional review from k8s owners.

Hello @shahar1 thank you for your comment! My thought was about extending current logic not changing it completely.
In my opinion if start_time happens to be None it should not lead to the error that a user should resolve manually. The recent pod could be defined based on the creation_timestamp. I don´t see it as a different way that a user should define in the operator because at the end the code only specifies which of the identical pods is most recent. WDYT?

Also, if it is possible @jscheffl could you please help with the review of the PR? Or advise someone who has expertise to review and approve this PR?

@Crowiant Crowiant force-pushed the fix-kubernetes-pod-operator-start-time branch from 5994234 to 6c3d442 Compare January 8, 2026 21:35
@shahar1
Copy link
Contributor

shahar1 commented Jan 10, 2026

Not my area of expertise either (yet) - I've assigned Copilot for an initial review, so please address its commentary as well. However, my 2 cents, if I understand these changes correctly - the if block makes the comparison based on a creation_timestamp, which is clearly not the start_time. Therefore, assigning the creation_timestamp to pod_start_times in the if statement is rather misleading. In that case, why not making the entire comparison based on creation_timestamp, replacing the start_time completely? Alternatively, if comparison based on start_time is useful for most cases, maybe you could introduce a flag for the user so they'll configure the comparison method?
I'll be happy for an additional review from k8s owners.

Hello @shahar1 thank you for your comment! My thought was about extending current logic not changing it completely. In my opinion if start_time happens to be None it should not lead to the error that a user should resolve manually. The recent pod could be defined based on the creation_timestamp. I don´t see it as a different way that a user should define in the operator because at the end the code only specifies which of the identical pods is most recent. WDYT?

Luckily I've started recently working on a CKA certification, so I feel more comfortable reviewing this PR. Please avoid tagging other commiters/PMC from now - let's try settling this one between us. If anyone else wants to review or comment, they are more than welcome, of course. For future PRs, please be aware of matter - as Jarek said, arbitrarily tagging maintainers might "taint" your current and future PRs from review (k8s pun intended). For better visibility in future PRs, I recommend sending a message in the #contributors Slack channel.

Now, let's get to what needs to be done:

  1. I closed irrelevant/nitpicking comments by Copilot - I left only the one which I find relevant (unreachable else statement).
  2. I understood the fallback logic, but it should be clearly reflected to the user (probably by log.info), and also documented shortly in the function's docstring.

Please take care of the two comments above, and from my prespective it will be good to merge.
Thank you and good luck!

@Crowiant Crowiant force-pushed the fix-kubernetes-pod-operator-start-time branch from 6c3d442 to 397d339 Compare January 12, 2026 14:37
@shahar1 shahar1 force-pushed the fix-kubernetes-pod-operator-start-time branch from 5de9379 to 2fae66e Compare January 12, 2026 18:07
@shahar1
Copy link
Contributor

shahar1 commented Jan 12, 2026

LGTM, improved the phrasing of the log message and rebased.
Will merge after all checks are green.

@shahar1 shahar1 merged commit b62ca70 into apache:main Jan 12, 2026
104 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants