Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Retry time in Exporter #7004

Open
ayushtkn opened this issue Jan 9, 2025 · 4 comments
Open

Fix Retry time in Exporter #7004

ayushtkn opened this issue Jan 9, 2025 · 4 comments
Labels
Feature Request Suggest an idea for this project

Comments

@ayushtkn
Copy link

ayushtkn commented Jan 9, 2025

Describe the bug

After the initial wait, subsequent wait times are randomized within an upper bound, leading to sporadic behavior that deviates from the expected retry pattern.

Related discussion:
#3936 (comment)

Steps to reproduce

  • Trigger a retry scenario.
  • Note the actual wait times between retries.

What did you expect to see?

The wait times should closely follow the calculated values, with a small and predictable jitter (e.g., 0.2).

What did you see instead?
The wait times are highly variable, appearing random and exceeding the expected jitter tolerance.

What version and what artifacts are you using?
All

Environment
Should be All

Additional context
For reference, see this related gRPC proposal:
grpc/proposal#452

@ayushtkn ayushtkn added the Bug Something isn't working label Jan 9, 2025
@jack-berg
Copy link
Member

from the expected retry pattern.

For what its worth, our exponential backoff algorithm's alignment with grpc's is a coincidence. Its now clear that for the purposes of OTLP, clients need to support "exponential backoff with jitter", but there's no common definition for what the algorithm is and what the parameters are.

So in the absence of some standard we're required / encouraged to follow, we need to evaluate changes to the algorithm on the merits of the changes.

exceeding the expected jitter tolerance.

What's the expected jitter tolerance? The grpc proposal indicates that its +-.2, but why .2 instead of .3?

@jack-berg jack-berg added Feature Request Suggest an idea for this project and removed Bug Something isn't working labels Jan 17, 2025
@ayushtkn
Copy link
Author

ayushtkn commented Jan 19, 2025

Something similar discussion around the impl that we are using currently
grpc/grpc-go#7514

By definition if we see for exponential backoff the subsequent retries after initial backoff should wait more than the previous ones, but since we are using random it is like from 0 to calculated values, so retries can exhaust way too quick in some scenarios.

If .2 jitter is debatable then maybe we can make this configurable or so

@jack-berg
Copy link
Member

I think I agree with making the logic of making the range random(target * .8, target * 1.2). It did always feel strange to me to take a random value between 0 and the target.

And as you mention, if needed, can always add an additional parameter to make 0.2 configurable.

But what's weird to me is why they don't still put a bound on it corresponding to the maxBackoff. Like, why not Math.min(random(target * .8, target * 1.2), maxBackoff)? The algorithm results in really unintuitive semantics for maxBackoff which could have been easily avoided. Like, why is the actual maximum backoff 1.2 * maxBackoff instead of just maxBackoff?

The debate I'm having now is: is this issue with maxBackoff a good enough reason to diverge from grpc? Whether or not the algorithm is perfect, its nice to point to some sort of standard as the basis for our algorithm.

@trask
Copy link
Member

trask commented Jan 23, 2025

its nice to point to some sort of standard as the basis for our algorithm

+1

why is the actual maximum backoff 1.2 * maxBackoff instead of just maxBackoff?

maybe it's good to have some jitter in the maxBackoff? e.g. in case the max retries continues even once it hits max backoff and so you don't end up in some kind of exactly every 30 seconds pinging

🤷‍♂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Suggest an idea for this project
Projects
None yet
Development

No branches or pull requests

3 participants