Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix client backoff #537

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Fix client backoff #537

wants to merge 2 commits into from

Conversation

fin3ss3g0d
Copy link

The code here fails to set a minimum value for the backoff based on this project.

image

This means if the server is down, the backoff minimum for every chisel user is set to a 100ms value by default according to the screenshot above and will result in extremely short backoffs (default behavior is shown below).

image

Specifically, the sequence will always be 100ms, 200ms, 400ms, 800ms, and 1.6s (if the server is down). This technically gives the user about 2 seconds to get the server back up before the client exits entirely, and it is noisy over the wire to send this many packets this fast. I hope you can see how this can be problematic based on use case. The updated behavior with this commit can be seen below and gives users more time to get the server back up and running:

image

@jpillora
Copy link
Owner

jpillora commented Oct 24, 2024 via email

@fin3ss3g0d
Copy link
Author

fin3ss3g0d commented Oct 24, 2024

Respectfully, I never said retrying was the issue or that the client would stop retrying unintentionally... the title of the PR is Fix client backoff because of the way you have configured the backoff for the client and is about backoff time. There should be more time in between the first 5 retries and the user should be able to control the time with a new flag... The first 5 retries will all take place within ~2 seconds which doesn't make sense from my standpoint. You are not giving the user enough time in between each retry in order to get the server back up in case it goes down. With your current logic, the backoff will eventually get above 2 seconds if you go above 5 retries, but it would be better to increase the minimum value of the backoff and allow the user to control it. This way, not as many retries are required and we can still get the wait time desired.

You always set Min of the Backoff to 100ms here and don't give the user the option to set it, so the first 5 backoffs will always be 100ms, 200ms, 400ms, 800ms, and 1.6s if the server goes down. This also means that if the user sets --max-retry-count to a value between 0 - 5 and the server goes down, the client will retry for ~2 seconds before exiting and the user will lose their chisel connection due to the extremely short wait times. Original:

image

After fix (and --min-retry-interval set to 1m):

image

You can do your own testing to verify what I am reporting. Start a server, connect a client, kill the server with the client still connected and you'll see the first 5 backoffs will always be 100ms, 200ms, 400ms, 800ms, 1.6s, etc. with no ability to customize.

I would classify this as a bug due to the unintentional nature of this behavior. You don't document this anywhere and I'm pretty sure a lot of people are unaware unless they take a look at the code surrounding the backoff logic. If you don't plan on fixing, you should document it is a known issue so people at least have the heads up. The logic might work for a development branch or testing, but when it comes to production version of the code, the user should be able to customize this behavior relating to the backoff.

@liojacqs
Copy link

liojacqs commented Nov 4, 2024

If not exposed as a client flag, I believe enabling Jitter by default would be a good thing as well:
b := &backoff.Backoff{Jitter: true, Max: c.config.MaxRetryInterval, Min: c.config.MinRetryInterval}

@fin3ss3g0d
Copy link
Author

If not exposed as a client flag, I believe enabling Jitter by default would be a good thing as well: b := &backoff.Backoff{Jitter: true, Max: c.config.MaxRetryInterval, Min: c.config.MinRetryInterval}

I'll add this to another commit @liojacqs soon 😄

@fin3ss3g0d
Copy link
Author

@liojacqs added the jitter option as a bool flag. @jpillora any updated thoughts so far on this PR?

@liojacqs
Copy link

liojacqs commented Nov 8, 2024

Just one thing on the first commit, for the default minimum retry inteval:
image

This code makes it impossible to have <1s retry interval other than 100ms, and I would personnaly add more flexibility, like:

	if c.MinRetryInterval <= 0 {
		c.MinRetryInterval = 100 * time.Millisecond
	}

That way the default will remain 100ms, but we can also set something like 1ms/250ms/600ms....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants