Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brainstorm: Expose keep-alive information to user #1701

Closed
mxinden opened this issue Aug 13, 2020 · 1 comment
Closed

Brainstorm: Expose keep-alive information to user #1701

mxinden opened this issue Aug 13, 2020 · 1 comment

Comments

@mxinden
Copy link
Member

mxinden commented Aug 13, 2020

Background

For paritytech/polkadot#1532 I am investigating why connections initiated through Kademlia are kept alive for more than the expected 10 second idle timeout. As far as I can tell there is no way to learn why connections are kept alive without intrusive libp2p source code changes. Ideally I would like to expose a Prometheus metric in Substrate.

substrate_sub_libp2p_connections_total{keep-alive-protocol="kademlia"} 10
substrate_sub_libp2p_connections_total{keep-alive-protocol="legacy"} 11

Hacky solution

To help debugging I extended KeepAlive with a protocol id indicating the protocol that would like to keep the connection alive.

diff --git a/swarm/src/protocols_handler.rs b/swarm/src/protocols_handler.rs
index 9721e9db..1fcc3fac 100644
--- a/swarm/src/protocols_handler.rs
+++ b/swarm/src/protocols_handler.rs
@@ -498,12 +498,12 @@ where T: ProtocolsHandler
 }
 
 /// How long the connection should be kept alive.
 pub enum KeepAlive {
     /// If nothing new happens, the connection should be closed at the given `Instant`.
-    Until(Instant),
+    Until(Instant, Cow<'static, [u8]>),
     /// Keep the connection alive.
-    Yes,
+    Yes(Cow<'static, [u8]>),
     /// Close the connection as soon as possible.
     No,
 }

ProtocolHandlers that delegate to other ProtocolHandlers aggregate the KeepAlive and pass the protocol id of the highest KeepAlive upwards.

Within node_handler.rs I could then log the id of the protocol that keeps the connection alive.

This lead to #1698 which triggered investigation for #1700.

Way forward

First of all: Do people feel the need to surface keep-alive information to the user? Or is the on-demand debugging through log lines good enough?

If we do want to expose that information we need to find a consistent way to do so. One suggestion from my side would be to bubble up a ~ KeepAlive event that records the id of the protocol keeping the protocol alive from the NodeHandler for each connection on a regular interval .

@romanb
Copy link
Contributor

romanb commented Aug 13, 2020

Relates to #1478.

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants