RFC: Expose KeepAlive information to NetworkBehaviour #1717

mxinden · 2020-08-20T09:40:13Z

The keep-alive mechanism allows ProtocolHandlers to signal whether
they still have any use for the underyling connection. Given that
ProtocolHandlers typically build a hierarchy of ProtocolHandlers it
is difficult to debug which ProtocolHandler is keeping a connection
alive.

With this commit a KeepAliveProtocolChange event is emitted each time
the protocol that keeps a connection alive changes. These events are
injected into the top level NetworkBehaviour. An implementor of the
NetworkBehaviour can use these events to count the number of
connections kept alive per protocol.

Exposing this keep alive information would help debugging issues like
#1072 #1698 #1700 and #1398.

Request for comments

As you can tell by the outstanding TODOs I am not opening up this pull request in order for it to be merged, but to collect feedback. More concretely I would like to answer the following questions:

Is there anyone objecting to the general idea of exposing this information?
Is there anyone objecting (or has an alternative) to the concrete solution of passing the protocol identifier via KeepAlive up to the NetworkBehaviour. Concerns that come to my mind are the additional complexity and the performance impact of passing around bytes.

Closes #1701
Closes #1478

The keep-alive mechanism allows `ProtocolHandler`s to signal whether they still have any use for the underyling connection. Given that `ProtocolHandler`s typically build a hierarchy of `ProtocolHandler`s it is difficult to debug which `ProtocolHandler` is keeping a connection alive. With this commit a `KeepAliveProtocolChange` event is emitted each time the protocol that keeps a connection alive changes. These events are injected into the top level `NetworkBehaviour`. An implementor of the `NetworkBehaviour` can use these events to count the number of connections kept alive per protocol.

romanb · 2020-08-24T10:40:01Z

Is there anyone objecting (or has an alternative) to the concrete solution of passing the protocol identifier via KeepAlive up to the NetworkBehaviour. Concerns that come to my mind are the additional complexity and the performance impact of passing around bytes.

I think this approach is doing a bit too much and blurs boundaries. E.g. KeepAliveProtocolChange is dragged as an event constructor through libp2p-core even though libp2p-core currently has no notion of a connection keep-alive facility at all. I understand that this is just to communicate the "keep-alive protocol change" from the ProtocolsHandler to the Swarm. I think that may be achieved in a simpler way by having NodeHandlerWrapper::OutEvent set to a libp2p-swarm-specific event wrapper type, since it may allow tunneling the new event through ConnectionHandlerEvent::Custom.

However, I don't think this approach in general is a real step forward for the keep-alive handling. Let me outline an approach I think may be preferable, though I didn't think through all the details yet:

There is a general connection keep-alive Duration in the Swarm configuration which applies to all connections as soon as they have no more open substreams that keep the connection alive.
Whether an open substream keeps a connection alive or not is configured via the SubstreamProtocol::connection_keep_alive (and hence determined in ProtocolsHandler::listen_protocol respectively ProtocolsHandlerEvent::OutboundSubstreamRequest).
The NodeHandlerWrapper in the libp2p-swarm would need to track open substreams for which SubstreamProtocol::connection_keep_alive is true. It may track the UpgradeInfo::protocol_info()s alongside.
There is no more ProtocolsHandler::connection_keep_alive. The ProtocolsHandler implementations alone via the SubstreamProtocol::connection_keep_alive configuration and opening/closing substreams determine whether a connection is kept alive.
To know why a connection is kept alive, one looks at the open "keep-alive" substreams tracked by the NodeHandlerWrapper of that connection, which may include the protocol infos/names used on the substreams.

It would be my hope that an approach along such lines would make it easier to reason about the keep-alive and avoid the pitfalls of correctly implementing ProtocolsHandler::connection_keep_alive. Instead one is only concerned with the substream handling in a protocol implementation and the keep-alive derives from that together with the globally configured connection keep-alive duration (which could also be scoped to a connection, of course). If such an approach works well in libp2p-swarm, and only then, we may consider if such a keep-alive facility should move from libp2p-swarm to libp2p-core.

Let me know what you think of such an approach. There may obviously be roadblocks I didn't take into account yet.

mxinden · 2020-08-31T13:52:09Z

I am sorry for the delay. Thank you @romanb for the detailed write-up!

Instead one is only concerned with the substream handling in a protocol implementation and the keep-alive derives from that together with the globally configured connection keep-alive duration (which could also be scoped to a connection, of course).

Agreed that the above would simplify the design.

There is a general connection keep-alive Duration in the Swarm configuration which applies to all connections as soon as they have no more open substreams that keep the connection alive.

That sounds reasonable. Currently a ProtocolsHandler can keep a connection alive, even without having a single substream open. I am not aware of any protocols requiring this feature. Is anyone aware of any protocol that depends on this? I guess keeping a dummy substream open would be a cheap workaround in case there is such protocol.

There is no more NetworkBehaviour::connection_keep_alive.

Just to make sure I am understanding this correctly, you are actually referring to ProtocolsHandler::connection_keep_alive, right?

To know why a connection is kept alive, one looks at the open "keep-alive" substreams tracked by the NodeHandlerWrapper of that connection, which may include the protocol infos/names used on the substreams.

How would this information make it from the NodeHandlerWrapper to the user (owner of the swarm)? "by having NodeHandlerWrapper::OutEvent set to a libp2p-swarm-specific event wrapper type"?

romanb · 2020-08-31T15:58:35Z

Just to make sure I am understanding this correctly, you are actually referring to ProtocolsHandler::connection_keep_alive, right?

Yes, that was a typo that I now corrected, thanks!

How would this information make it from the NodeHandlerWrapper to the user (owner of the swarm)? "by having NodeHandlerWrapper::OutEvent set to a libp2p-swarm-specific event wrapper type"?

Probably. Since the connection (handler) is fully owned by the background task, connection (handler) events seem to be the only communication mechanism available.

mxinden · 2020-09-24T19:28:32Z

Update: Currently testing out the above suggestions. Not with a high priority though.

mxinden · 2020-09-29T13:07:02Z

Within my fork NodeHandlerWrapper can track newly created substreams with keep_alive == true. But it has no way to tell whether any of those substreams are still open. I have not yet been able to come up anything other than wrapping each substream before passing it down to the ProtocolsHandler, waiting for the inner to close, which seems more of a hack than a proper solution.

mxinden · 2021-02-23T16:12:32Z

With the most recent comment in mind, I am closing this pull request. For anyone willing to continue this effort, I am happy to help.

mxinden mentioned this pull request Nov 26, 2020

protocols/gossipsub: Review sigp/rust-libp2p#93

Merged

romanb added the on-ice label Jan 12, 2021

mxinden closed this Feb 23, 2021

mxinden mentioned this pull request Jul 20, 2021

Implement the rendezvous protocol #2107

Merged

25 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Expose KeepAlive information to NetworkBehaviour #1717

RFC: Expose KeepAlive information to NetworkBehaviour #1717

mxinden commented Aug 20, 2020

romanb commented Aug 24, 2020 •

edited

Loading

mxinden commented Aug 31, 2020

romanb commented Aug 31, 2020

mxinden commented Sep 24, 2020

mxinden commented Sep 29, 2020

mxinden commented Feb 23, 2021

RFC: Expose KeepAlive information to NetworkBehaviour #1717

RFC: Expose KeepAlive information to NetworkBehaviour #1717

Conversation

mxinden commented Aug 20, 2020

Request for comments

romanb commented Aug 24, 2020 • edited Loading

mxinden commented Aug 31, 2020

romanb commented Aug 31, 2020

mxinden commented Sep 24, 2020

mxinden commented Sep 29, 2020

mxinden commented Feb 23, 2021

romanb commented Aug 24, 2020 •

edited

Loading