Skip to content
This repository has been archived by the owner on Feb 9, 2024. It is now read-only.

add temporary failures as a reason to retry kubernetes failures #2651

Merged
merged 2 commits into from
Sep 28, 2021

Conversation

knisbet
Copy link
Contributor

@knisbet knisbet commented Sep 24, 2021

Description

Fix missing retries on transient errors in kubernetes operations.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Linked tickets and other PRs

Updates #2650

TODOs

  • Self-review the change

Implementation

This is a relatively small and self contained change, as reported by Customer S, they see a higher than expected failure rate in their CI of upgrade testing. We got a stack trace from them (excerpt in #2650) that show's we're not retrying on "connection refused" errors.

I didn't do any manual testing on this one, as it would likely be a bit tricky to produce the right network error at the correct time. That also would only cover the specific error, where it looks like lots of temporary failures are missing in the logic. Worst case is we retry a few times for something we shouldn't.

The "connect refused" error as present in the stack, should be covered by the existing utils.IsTransientClusterError, and I threw in various kubernetes transient errors as good measure. https://github.com/gravitational/gravity/blob/master/lib/utils/error.go#L106-L132

Testing done

Only looking for robotest results. See justification above.

Additional information

@knisbet knisbet requested review from a team, wadells and bernardjkim September 24, 2021 15:19
@knisbet knisbet merged commit 03d0169 into master Sep 28, 2021
@knisbet knisbet deleted the kevin/master/2650-retry-transient-errors branch September 28, 2021 14:05
knisbet pushed a commit that referenced this pull request Sep 28, 2021
knisbet pushed a commit that referenced this pull request Sep 28, 2021
knisbet pushed a commit that referenced this pull request Sep 28, 2021
knisbet pushed a commit that referenced this pull request Sep 28, 2021
knisbet pushed a commit that referenced this pull request Sep 28, 2021
knisbet pushed a commit that referenced this pull request Sep 28, 2021
knisbet pushed a commit that referenced this pull request Sep 28, 2021
knisbet pushed a commit that referenced this pull request Sep 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants