Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: Continue to retry indefinitely #3599

Merged
merged 1 commit into from
Apr 9, 2024

Conversation

nemunaire
Copy link
Contributor

When the woodpecker server is not reachable (eg. for update, maintenance, agent connection issue, ...) for a long period of time, the agent tries continuously to reconnect, without any delay. This creates several GB of logs in a short period of time.

Here is a sample line, repeated indefinitely:

{"level":"error","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp x.x.x.x:xxx: connect: connection refused\"","time":"2024-04-07T17:29:59Z","message":"grpc error: done(): code: Unavailable"}

It appears that the backoff package, after a certain amount of time, returns backoff.Stop (-1) instead of a valid delay to wait. It means that no more retry should be made, as shown in the example. But the code doesn't handle that case and takes -1 as the next delay.
This led to continuous retry with no delay between them and creates a huge amount of logs.

MaxElapsedTime default is 15 minutes, passed this time, NextBackOff returns backoff.Stop (-1) instead of MaxInterval.
This commit sets MaxElapsedTime to 0, to avoid Stop return.

MaxElapsedTime default is 15 minutes, passed this time, NextBackOff
returns backoff.Stop (-1) instead of MaxInterval. This led to
continuous retry with no delay and create a huge amount of logs.
@qwerty287 qwerty287 added bug Something isn't working agent labels Apr 8, 2024
@qwerty287 qwerty287 added this to the 2.5.0 milestone Apr 8, 2024
@lafriks lafriks merged commit 8e45ddd into woodpecker-ci:main Apr 9, 2024
6 of 7 checks passed
@lafriks
Copy link
Contributor

lafriks commented Apr 9, 2024

Failure unrelated to this PR

@woodpecker-bot woodpecker-bot mentioned this pull request Apr 9, 2024
1 task
@woodpecker-bot
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants