Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional operator edge case fixes #2007

Merged
merged 4 commits into from
Aug 9, 2024
Merged

Additional operator edge case fixes #2007

merged 4 commits into from
Aug 9, 2024

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Aug 8, 2024

Fix a few edge-case situations:

  • Restart evicted pods that have reached the terminal Failed state with reason Evicted, by just recreating them. These pods will not be automatically retried, so need to be recreated (usually happens due to memory pressure from the node)
  • Don't treat containers in ContainerCreating as running, even though this state is usually quick, its possible for containers to get stuck there, and will improve accuracy of exec seconds tracking.
  • Consolidate state transition for running states, either sets to running or to pending-wait/generate-wacz/upload-wacz and allows changing from to either of these states from each other or waiting_capacity

log when restarting evicted pods
ease 'running' transition to allow switching from any other running/waiting state
…l in that phase

under certain conditions, wait until actually running
… of other

running states 'pending-wait', 'generate-wacz', 'uploading-wacz', if appropriate
@ikreymer ikreymer requested a review from tw4l August 8, 2024 19:46
- return and print restart reason
- don't print state transition if not actually changing states
@ikreymer ikreymer merged commit 4ec7cf8 into main Aug 9, 2024
4 checks passed
@ikreymer ikreymer deleted the restart-evicted branch August 9, 2024 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant