Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

feat: coalesce and queue connection event handling #565

Merged
merged 3 commits into from
Jun 13, 2022

Conversation

Stebalien
Copy link
Member

Motivation

This change handles connectedness and responsiveness events asynchronously, coalescing multiple state-changes into one. In the past, this area has caused quite a bit of goroutine buildup as we fall behind processing connect and disconnect events.

Design

The design is a simple state-machine with the following components:

  1. A record of the current and "new" state for each peer.

  2. A queue of peers that have "changed".

  3. A worker that applies state transitions.

  4. When an event (disconnect, connect, unresponsive, responsive) happens, we record the desired (new) state for that peer and enqueue the peer for further processing by the background worker.

  5. The background worker takes peers off the background task queue, applying state-changes.

Importantly:

  1. There's only one background worker. Unfortunately, these state-changes take global locks so there's no point in attempting to parallelize.
  2. Multiple state-changes will be coalesced and only the final state-transition will be applied. The current system will apply all state transitions, even if the state-transition is out of date.
    • For example, the current system will go through the process of marking a peer as "connected", even after the peer has disconnected. The new system will coalesce these events, and drop the peer without further processing.
  3. Recording state transitions is really cheap (doesn't even use a channel). OnMessage is called all the time, so this is pretty important.

One drawback to not using a channel is that we have a global lock which may end up getting contended. However, we only hold the lock for very short periods of time, so this should be fine (testing TBD).

@Stebalien Stebalien requested a review from aschmahmann May 31, 2022 23:41
@BigLep
Copy link

BigLep commented Jun 1, 2022

This addresses #547 right?

@BigLep BigLep linked an issue Jun 1, 2022 that may be closed by this pull request
@Stebalien
Copy link
Member Author

Yes. Ideally we would be using the event bus, not callback-based notifications, but this "fixes" the issue (even if it doesn't pay down all the technical debt).

Copy link
Contributor

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the code looks good and is certainly much nicer then what was here before 🙏.

Left comments mostly around adding some comments and minor test fixes.

network/connecteventmanager.go Outdated Show resolved Hide resolved
network/connecteventmanager.go Outdated Show resolved Hide resolved
network/connecteventmanager.go Outdated Show resolved Hide resolved
network/connecteventmanager.go Outdated Show resolved Hide resolved
testnet/network_test.go Outdated Show resolved Hide resolved
network/ipfs_impl_test.go Outdated Show resolved Hide resolved
network/connecteventmanager_test.go Show resolved Hide resolved
network/connecteventmanager_test.go Show resolved Hide resolved
network/connecteventmanager.go Outdated Show resolved Hide resolved
network/connecteventmanager_test.go Show resolved Hide resolved
@Stebalien Stebalien force-pushed the feat/batched-connection-events branch from 1ae811f to 695488b Compare June 2, 2022 04:27
@Stebalien Stebalien requested a review from aschmahmann June 2, 2022 04:27
@Stebalien Stebalien merged commit a06a9ea into master Jun 13, 2022
@Stebalien Stebalien deleted the feat/batched-connection-events branch June 13, 2022 16:03
@aschmahmann
Copy link
Contributor

Note: while this PR has a number of failing tests preliminary production tests have shown it works and unfortunately this repo has a lot of flaky tests that we need to deal with such as ipfs/boxo#86

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Receive swarm notifications synchronously
3 participants