Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Expose KeepAlive information to NetworkBehaviour #1717

Closed
wants to merge 1 commit into from

Conversation

mxinden
Copy link
Member

@mxinden mxinden commented Aug 20, 2020

The keep-alive mechanism allows ProtocolHandlers to signal whether
they still have any use for the underyling connection. Given that
ProtocolHandlers typically build a hierarchy of ProtocolHandlers it
is difficult to debug which ProtocolHandler is keeping a connection
alive.

With this commit a KeepAliveProtocolChange event is emitted each time
the protocol that keeps a connection alive changes. These events are
injected into the top level NetworkBehaviour. An implementor of the
NetworkBehaviour can use these events to count the number of
connections kept alive per protocol.

Exposing this keep alive information would help debugging issues like
#1072 #1698 #1700 and #1398.

Request for comments

As you can tell by the outstanding TODOs I am not opening up this pull request in order for it to be merged, but to collect feedback. More concretely I would like to answer the following questions:

  1. Is there anyone objecting to the general idea of exposing this information?

  2. Is there anyone objecting (or has an alternative) to the concrete solution of passing the protocol identifier via KeepAlive up to the NetworkBehaviour. Concerns that come to my mind are the additional complexity and the performance impact of passing around bytes.

Closes #1701
Closes #1478

The keep-alive mechanism allows `ProtocolHandler`s to signal whether
they still have any use for the underyling connection. Given that
`ProtocolHandler`s typically build a hierarchy of `ProtocolHandler`s it
is difficult to debug which `ProtocolHandler` is keeping a connection
alive.

With this commit a `KeepAliveProtocolChange` event is emitted each time
the protocol that keeps a connection alive changes. These events are
injected into the top level `NetworkBehaviour`. An implementor of the
`NetworkBehaviour` can use these events to count the number of
connections kept alive per protocol.
@romanb
Copy link
Contributor

romanb commented Aug 24, 2020

Is there anyone objecting (or has an alternative) to the concrete solution of passing the protocol identifier via KeepAlive up to the NetworkBehaviour. Concerns that come to my mind are the additional complexity and the performance impact of passing around bytes.

I think this approach is doing a bit too much and blurs boundaries. E.g. KeepAliveProtocolChange is dragged as an event constructor through libp2p-core even though libp2p-core currently has no notion of a connection keep-alive facility at all. I understand that this is just to communicate the "keep-alive protocol change" from the ProtocolsHandler to the Swarm. I think that may be achieved in a simpler way by having NodeHandlerWrapper::OutEvent set to a libp2p-swarm-specific event wrapper type, since it may allow tunneling the new event through ConnectionHandlerEvent::Custom.

However, I don't think this approach in general is a real step forward for the keep-alive handling. Let me outline an approach I think may be preferable, though I didn't think through all the details yet:

  1. There is a general connection keep-alive Duration in the Swarm configuration which applies to all connections as soon as they have no more open substreams that keep the connection alive.
  2. Whether an open substream keeps a connection alive or not is configured via the SubstreamProtocol::connection_keep_alive (and hence determined in ProtocolsHandler::listen_protocol respectively ProtocolsHandlerEvent::OutboundSubstreamRequest).
  3. The NodeHandlerWrapper in the libp2p-swarm would need to track open substreams for which SubstreamProtocol::connection_keep_alive is true. It may track the UpgradeInfo::protocol_info()s alongside.
  4. There is no more ProtocolsHandler::connection_keep_alive. The ProtocolsHandler implementations alone via the SubstreamProtocol::connection_keep_alive configuration and opening/closing substreams determine whether a connection is kept alive.
  5. To know why a connection is kept alive, one looks at the open "keep-alive" substreams tracked by the NodeHandlerWrapper of that connection, which may include the protocol infos/names used on the substreams.

It would be my hope that an approach along such lines would make it easier to reason about the keep-alive and avoid the pitfalls of correctly implementing ProtocolsHandler::connection_keep_alive. Instead one is only concerned with the substream handling in a protocol implementation and the keep-alive derives from that together with the globally configured connection keep-alive duration (which could also be scoped to a connection, of course). If such an approach works well in libp2p-swarm, and only then, we may consider if such a keep-alive facility should move from libp2p-swarm to libp2p-core.

Let me know what you think of such an approach. There may obviously be roadblocks I didn't take into account yet.

@mxinden
Copy link
Member Author

mxinden commented Aug 31, 2020

I am sorry for the delay. Thank you @romanb for the detailed write-up!

Instead one is only concerned with the substream handling in a protocol implementation and the keep-alive derives from that together with the globally configured connection keep-alive duration (which could also be scoped to a connection, of course).

Agreed that the above would simplify the design.

There is a general connection keep-alive Duration in the Swarm configuration which applies to all connections as soon as they have no more open substreams that keep the connection alive.

That sounds reasonable. Currently a ProtocolsHandler can keep a connection alive, even without having a single substream open. I am not aware of any protocols requiring this feature. Is anyone aware of any protocol that depends on this? I guess keeping a dummy substream open would be a cheap workaround in case there is such protocol.

There is no more NetworkBehaviour::connection_keep_alive.

Just to make sure I am understanding this correctly, you are actually referring to ProtocolsHandler::connection_keep_alive, right?

To know why a connection is kept alive, one looks at the open "keep-alive" substreams tracked by the NodeHandlerWrapper of that connection, which may include the protocol infos/names used on the substreams.

How would this information make it from the NodeHandlerWrapper to the user (owner of the swarm)? "by having NodeHandlerWrapper::OutEvent set to a libp2p-swarm-specific event wrapper type"?

@romanb
Copy link
Contributor

romanb commented Aug 31, 2020

Just to make sure I am understanding this correctly, you are actually referring to ProtocolsHandler::connection_keep_alive, right?

Yes, that was a typo that I now corrected, thanks!

How would this information make it from the NodeHandlerWrapper to the user (owner of the swarm)? "by having NodeHandlerWrapper::OutEvent set to a libp2p-swarm-specific event wrapper type"?

Probably. Since the connection (handler) is fully owned by the background task, connection (handler) events seem to be the only communication mechanism available.

@mxinden
Copy link
Member Author

mxinden commented Sep 24, 2020

Update: Currently testing out the above suggestions. Not with a high priority though.

@mxinden
Copy link
Member Author

mxinden commented Sep 29, 2020

Within my fork NodeHandlerWrapper can track newly created substreams with keep_alive == true. But it has no way to tell whether any of those substreams are still open. I have not yet been able to come up anything other than wrapping each substream before passing it down to the ProtocolsHandler, waiting for the inner to close, which seems more of a hack than a proper solution.

@mxinden
Copy link
Member Author

mxinden commented Feb 23, 2021

With the most recent comment in mind, I am closing this pull request. For anyone willing to continue this effort, I am happy to help.

@mxinden mxinden closed this Feb 23, 2021
@mxinden mxinden mentioned this pull request Jul 20, 2021
25 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Brainstorm: Expose keep-alive information to user Have a way to find out why a connection is kept open
2 participants