Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swarm: better backoff logic #1554

Open
Stebalien opened this issue Oct 18, 2017 · 7 comments
Open

swarm: better backoff logic #1554

Stebalien opened this issue Oct 18, 2017 · 7 comments

Comments

@Stebalien
Copy link
Member

  1. We should try to distinguish between local failures and remote failures. At the very least, we should be resetting our backoffs when new links/routes come online.
  2. We should probably be backing off on a per multiaddr basis, not a per peer basis (unless we establish a connection to the peer and it tells us to to away (need a new protocol for that, related to "disconnect" protocol/message #238).

Came up in: libp2p/go-libp2p-kad-dht#96

@mishto
Copy link

mishto commented Jan 24, 2018

Can we expose baseBackoffTime and maxBackoffTime? the default values are arbitrary and different applications may want different settings.

@Stebalien
Copy link
Member Author

Fair enough. Also, it looks like our backoff aren't actually exponential...

@Stebalien
Copy link
Member Author

This will be fixed in large refactor/simplification that's coming down the pipe.

@Stebalien
Copy link
Member Author

Note to self: Refund backoff "tries" after a period of time. Currently, if we go to max-backoff, wait an hour, and then fail a single dial, we'll wait the max backoff again. We should, instead, notice that an hour has passed and forget all the previous failures.

Code:

	now := time.Now()
	if sinceLast := now.Sub(bp.until); sinceLast > 0 {
		// Refund backoff time at the same rate.
		refund := int(math.Sqrt(float64((sinceLast - BackoffBase) / BackoffCoef)))
		if refund < bp.tries {
			bp.tries -= refund
		} else {
			bp.tries = 0
		}
	}

Not going to do this now because we have so many other changes in the pipeline and we may want to discuss this.

@mishto
Copy link

mishto commented Jan 29, 2018

Sounds good, thanks.

@Stebalien
Copy link
Member Author

Working through all the different backoff cases:

  • Backoff trying to find a peer.
    • This definitely belongs down in the DHT, or as a wrapper around the DHT.
  • Backoff a port/ip because a TCP dial failed.
    • This could happen inside the transport or inside the swarm itself.
      • If it happens inside the transport, we'd need a shared backoff module for backing off dialing multiaddrs with certain prefixes.
      • If it happens inside the swarm, we'd need some way to report the backoff to the swarm. We'd probably do this by returning a special error.
  • Backoff an IP when we get a "no route to IP" error.
    • Same as above.
  • Backoff a port/ip/peer triple when we end up dialing the wrong peer.
    • Same as above.
  • Backoff a peer/transport when we fail to negotiate a muxer/security transport.
    • This is an interesting case. Really, we want to backoff the entire peer for all transports using the upgrader upgrader. This is a case where applying the backoff from within the transport is really the only solution that makes sense (as the transport knows what sub-transports it uses).

petar referenced this issue in libp2p/go-libp2p-core Mar 5, 2020
This tries to provide a simple-to-reason-about solution to the list of problems
in https://github.com/libp2p/go-libp2p-swarm/issues/37
petar referenced this issue in libp2p/go-libp2p-core Mar 5, 2020
This tries to provide a simple-to-reason-about solution to the list of problems
in https://github.com/libp2p/go-libp2p-swarm/issues/37
@Stebalien
Copy link
Member Author

Status: While @petar's patches are likely the right way to go in the future, they introduce quite a few new interfaces that'll need to be discussed. In the interest of getting a fast fix in, @willscott is implementing (#191) a dumb version that just backs off full addresses inside the swarm itself without changing core libp2p interfaces.

That gives us some breathing room.

@marten-seemann marten-seemann changed the title Better backoff logic swarm: better backoff logic May 25, 2022
@marten-seemann marten-seemann transferred this issue from libp2p/go-libp2p-swarm May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants