Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc_retry backoff overflow #747

Merged

Conversation

JacobSMoller
Copy link
Contributor

Changes

  • Ensure that exponential backoff does not overflow giving unexpected - values backoff for retries higer than 37.
  • Also add an exponential backoff with a max bound so it doesn't just go until ~290 years, useful for retry scenarios that should run "indefinitely" in the hopes a self healing service e.g a service in kubernetes comes back up.
  • Suggestion to add a if a == 0 to exponentBase2 as its mathematically correct for any nonzero integers raised to the power of 0 to be 1 not 0 and adding a boundary to avoid overflowing int64 which durations uses.
  • Another suggestion could be to do use math.Exp2

using math.Exp2 does add a bit of overhead but not a lot.

BenchmarkPowerCalculations/BitShift-16         	1000000000	         0.2438 ns/op	       0 B/op	       0 allocs/op
BenchmarkPowerCalculations/Pow-16              	19786604	        59.27 ns/op	       0 B/op	       0 allocs/op
BenchmarkPowerCalculations/Exp2-16             	559875931	         2.116 ns/op	       0 B/op	       0 allocs/op

Verification

Added unit tests

@JacobSMoller
Copy link
Contributor Author

>> ensuring copyright headers
cannot set copyright headers: you have unstaged changes.
M	README.md
Please commit or stash them.

not sure what im doing wrong :(

@johanbrandhorst
Copy link
Collaborator

>> ensuring copyright headers
cannot set copyright headers: you have unstaged changes.
M	README.md
Please commit or stash them.

not sure what im doing wrong :(

This isn't your fault, our CI has been broken like this for a while 💀

Copy link
Collaborator

@johanbrandhorst johanbrandhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I expect it was weird to have this actually overflow 😂. I don't think most users actually hit 37 exponentially increasing retries 😱.

@johanbrandhorst johanbrandhorst merged commit d75a1b8 into grpc-ecosystem:main Feb 11, 2025
6 of 7 checks passed
@JacobSMoller
Copy link
Contributor Author

JacobSMoller commented Feb 11, 2025

Yea I also dont expect it to really ever happen again. But now it did and we noticed we got OOM so I had to look into it

And then us adding a maxBound of 10 sec made it a little more realistic to reach that high of course

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants