Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry with exponential backoff when fetching artifacts #289

Merged
merged 1 commit into from
Feb 27, 2021

Conversation

stefanprodan
Copy link
Member

@stefanprodan stefanprodan commented Feb 26, 2021

This PR implements retries with exponential backoff when fetching artifacts from source-controller. By default, the controller does 10 attempts within a 3.5 minutes window, the number of max retries can be configure using the --http-retry cmd arg.

This mitigates the alert spam (i/o timeout err) when source-controller becomes unavailable for a short period of time e.g. after an upgrade.

Example error log:

{"level":"error","ts":"2021-02-26T12:03:19.710+0200","logger":"controller.kustomization","msg":"Reconciliation failed after 3m35.317406763s, next try in 5m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"podinfo","namespace":"flux-system","revision":"master/e43ebfa5bf4b87c46f2e1db495eb571cd398e2f7","error":"failed to download artifact, error: GET http://localhost:8080/gitrepository/flux-system/podinfo/e43ebfa5bf4b87c46f2e1db495eb571cd398e2f7.tar.gz giving up after 10 attempt(s): Get \"http://localhost:8080/gitrepository/flux-system/podinfo/e43ebfa5bf4b87c46f2e1db495eb571cd398e2f7.tar.gz\": dial tcp [::1]:8080: connect: connection refused"}

Ref: fluxcd/flux2#661 fluxcd/notification-controller#76

Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
@stefanprodan stefanprodan added the enhancement New feature or request label Feb 26, 2021
Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @stefanprodan

@stefanprodan stefanprodan merged commit f2c986a into main Feb 27, 2021
@stefanprodan stefanprodan deleted the http-retry branch February 27, 2021 09:35
hiddeco added a commit to fluxcd/flux2 that referenced this pull request Mar 3, 2021
This check becomes obsolete as soon as
fluxcd/kustomize-controller#289 has been
released, but prevents the bootstrap process from erroring on
`failed to download artifact from [..] connect: connection refused`
errors when the source-controller is taking a longer time to boot.

Signed-off-by: Hidde Beydals <hello@hidde.co>
hiddeco added a commit to fluxcd/flux2 that referenced this pull request Mar 3, 2021
This check becomes obsolete as soon as
fluxcd/kustomize-controller#289 has been
released, but prevents the bootstrap process from erroring on
`failed to download artifact from [..] connect: connection refused`
errors when the source-controller is taking a longer time to boot.

Signed-off-by: Hidde Beydals <hello@hidde.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants