Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat connection limit errors as pending connection errors. #1546

Merged
merged 3 commits into from
Apr 9, 2020

Conversation

romanb
Copy link
Contributor

@romanb romanb commented Apr 8, 2020

Currently connection limit errors emitted by a Pool when the per-peer connection limit is reached are emitted as ConnectionErrors as soon as background task reports the connection as established, i.e. there is never a ConnectionEstablished event emitted in these cases. This is problematic for two reasons:

  1. The Network code itself actually relies on an outgoing connection attempt to finish either with PendingConnectionError or ConnectionEstablished, otherwise there may even be a leak of entries in self.dialing which never get removed.

  2. It actually was the intention that Network::ConnectionEstablished events are always paired up 1-1 with Network::ConnectionError events (though the latter should probably be renamed to ConnectionClosed). The Swarm relies on this behaviour and maps ConnectionError to SwarmEvent::ConnectionClosed, as well as invoking NetworkBehaviour::inject_connection_closed, which may consequently be invoked for connections for which inject_connection_established was never called.

In summary, the per-peer connection limit is currently not safe to use. These problems are addressed here by having the Pool emit a PendingConnectionError instead when the limit of established connections of a peer is hit, thus moving the ConnectionError::ConnectionLimit to PendingConnectionError::ConnectionLimit.

This was supposedly observed via an underflowing counter in a network behaviour that tracks the number of established connections via inject_connection_established / inject_connection_closed.

I didn't get around yet to write a proper test case for this scenario, which I still intend to do, possibly in a follow-up PR, but since the patch is relatively simple and may want to be released relatively quickly, I'm already opening the PR as-is.

@@ -149,7 +149,7 @@ where

/// The handler that was passed to `dial()`, if the
/// connection failed before the handler was consumed.
handler: THandler,
handler: Option<THandler>,
Copy link
Member

@tomaka tomaka Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this field altogether here, as it is confusing and I don't think it is useful in any way.
It is confusing in pool.rs as well, but it is more tolerable there as this is a more internal struct.

Alternatively, if we leave it, we should clarify exactly in which situations this is Some and in which this is None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing it sounds good to me, I never found it particularly useful.

@tomaka tomaka mentioned this pull request Apr 9, 2020
@tomaka tomaka merged commit d2eebf2 into libp2p:master Apr 9, 2020
@romanb romanb deleted the connection-limit-pending-error branch April 9, 2020 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants