-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Prevent transient error in case when Pod start_time parameter is None #59097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent transient error in case when Pod start_time parameter is None #59097
Conversation
a4f703d to
951ae12
Compare
951ae12 to
5994234
Compare
|
Not sure why you are calling me - that's not my area of expertise. Generally what we prefer is when you are pinging to attract maintainer to review - ping "in general" "Hey can I get a review" - generally speaking when I answer to your ping now that "this is not my area of expertise" - this will significantly decrease your chances of getting help from some other maintainer, because they will see I was involved (and they will not even see that I responded "not my area of interest". But .. you made your choice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses a transient error in the Kubernetes provider that occurs when duplicate pods are spawned but their pod.status.start_time parameter is not yet set. The fix adds a fallback mechanism to use creation_timestamp from pod metadata when start_time is unavailable.
Key changes:
- Modified
_get_most_recent_pod_indexmethod to fallback tocreation_timestampwhenstart_timeis None - Added two new test cases to verify the fallback behavior
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py | Added fallback logic to use creation_timestamp when start_time is None in the _get_most_recent_pod_index method |
| providers/cncf/kubernetes/tests/unit/cncf/kubernetes/operators/test_pod.py | Added two test cases: one testing fallback to creation_timestamp when start_time is None, and another testing behavior when both are None |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py
Outdated
Show resolved
Hide resolved
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py
Show resolved
Hide resolved
providers/cncf/kubernetes/tests/unit/cncf/kubernetes/operators/test_pod.py
Show resolved
Hide resolved
providers/cncf/kubernetes/tests/unit/cncf/kubernetes/operators/test_pod.py
Show resolved
Hide resolved
providers/cncf/kubernetes/tests/unit/cncf/kubernetes/operators/test_pod.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not my area of expertise either (yet) - I've assigned Copilot for an initial review, so please address its commentary as well.
However, my 2 cents, if I understand these changes correctly - the if block makes the comparison based on a creation_timestamp, which is clearly not the start_time. Therefore, assigning the creation_timestamp to pod_start_times in the if statement is rather misleading.
In that case, why not making the entire comparison based on creation_timestamp, replacing the start_time completely? Alternatively, if comparison based on start_time is useful for most cases, maybe you could introduce a flag for the user so they'll configure the comparison method?
I'll be happy for an additional review from k8s owners.
|
Thank you for the comments. I tagged you because I saw that Jarek previously participated in Kubernetes provider PRs approvals, and Shahar approved my previous PR (#49899). While I understand your point, please note that the auto-assigned reviewers haven't reviewed any PRs in the last 6 months. I tagged you because the standard process didn't seem to be working. Since I mostly contribute to the Google provider, I assumed tagging active maintainers was the right move to unblock the PR. I wasn't trying to be disruptive, just trying to get the code reviewed. So I apologize if I didn't follow the specific protocol for Kubernetes provider. I'll keep this in mind for next time. |
I am not sure where you get that statistics from, but maybe try to ping those auto-assigned reviewers. You have not tried it yet and seems that they are best suited. Maybe they don't realise you wait for their review? |
|
Hello @potiuk ! Yes, my mistake, I checked manually and missed some of the PR´s reviewed by the code owners. I apologize for the false statement. Thank you for pointing it out. I will ping them in a separate comment. |
|
Hello @hussein-awala @jedcunningham could you please help with the review of the PR? Thank you! |
Hello @shahar1 thank you for your comment! My thought was about extending current logic not changing it completely. Also, if it is possible @jscheffl could you please help with the review of the PR? Or advise someone who has expertise to review and approve this PR? |
5994234 to
6c3d442
Compare
Luckily I've started recently working on a CKA certification, so I feel more comfortable reviewing this PR. Please avoid tagging other commiters/PMC from now - let's try settling this one between us. If anyone else wants to review or comment, they are more than welcome, of course. For future PRs, please be aware of matter - as Jarek said, arbitrarily tagging maintainers might "taint" your current and future PRs from review (k8s pun intended). For better visibility in future PRs, I recommend sending a message in the Now, let's get to what needs to be done:
Please take care of the two comments above, and from my prespective it will be good to merge. |
6c3d442 to
397d339
Compare
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py
Show resolved
Hide resolved
5de9379 to
2fae66e
Compare
|
LGTM, improved the phrasing of the log message and rebased. |
There is a chance of a transient error inside the Kubernetes provider when two identical pods are spawned but the pod.status.start_time parameter is not set yet. To prevent this situation, an additional check of creation_timestamp was added. It is a more convenient step to compare identical pods based on the time when etcd receives information about the pod rather than when the scheduler sends information to the kubelet.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.