Skip to content

fix: pod with a restart policy of Never or OnFailure stuck at 'Progressing' (#15317) #709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

RoelofKuijpers
Copy link

@RoelofKuijpers RoelofKuijpers commented Apr 9, 2025

This implementation extends the health condition check for pods.
Previously the assumption was that Pods with restart policy of Never or OnFailure are hooks with a finite life, these were considered as Progressing instead of Healthy. However, this logic does not apply when the pod is managed by an operator (e.g., Flink operator) and therefore has a restart policy of Never.
We introduce a new annotation which existence is checked when the pod is Running, that allows for skipping this logic on restart policy.

Copy link

codecov bot commented Apr 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.68%. Comparing base (8849c3f) to head (61ddfd1).
Report is 41 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #709      +/-   ##
==========================================
- Coverage   54.26%   53.68%   -0.58%     
==========================================
  Files          64       64              
  Lines        6164     6480     +316     
==========================================
+ Hits         3345     3479     +134     
- Misses       2549     2725     +176     
- Partials      270      276       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@drewhemm
Copy link

drewhemm commented Apr 9, 2025

This looks like a good approach to the problem.

@drewhemm
Copy link

drewhemm commented Apr 9, 2025

The pod manifest needs the following to pass the tests:

  • Compute and storage resources defined
  • The alpine tag needs to use something other than latest, e.g. 3.21
  • Add automountServiceAccountToken: false to the pod spec, as per the Kubernetes docs

@RoelofKuijpers
Copy link
Author

@drewhemm I have made the changes you suggested to get a Quality Gate pass

@drewhemm
Copy link

drewhemm commented Apr 9, 2025

Cool, looks like the last blocking issue is the commit sign off.

@RoelofKuijpers RoelofKuijpers force-pushed the 15317 branch 2 times, most recently from 444a326 to 84a1039 Compare April 9, 2025 11:41
@drewhemm
Copy link

drewhemm commented Apr 9, 2025

A non-blocking issue has been flagged by SonarQube, probably best to resolve it as follows:

resources:
  requests:
    ephemeral-storage: "100Mi"

Copy link
Contributor

@sivchari sivchari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. left nits.

RoelofKuijpers and others added 6 commits April 21, 2025 18:32
Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
…sses the checks of the Quality Gate

Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
improve code readability

Co-authored-by: sivchari <shibuuuu5@gmail.com>
Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
@RoelofKuijpers RoelofKuijpers requested a review from sivchari April 21, 2025 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants