You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For paritytech/polkadot#1532 I am investigating why connections initiated through Kademlia are kept alive for more than the expected 10 second idle timeout. As far as I can tell there is no way to learn why connections are kept alive without intrusive libp2p source code changes. Ideally I would like to expose a Prometheus metric in Substrate.
To help debugging I extended KeepAlive with a protocol id indicating the protocol that would like to keep the connection alive.
diff --git a/swarm/src/protocols_handler.rs b/swarm/src/protocols_handler.rs
index 9721e9db..1fcc3fac 100644
--- a/swarm/src/protocols_handler.rs+++ b/swarm/src/protocols_handler.rs@@ -498,12 +498,12 @@ where T: ProtocolsHandler
}
/// How long the connection should be kept alive.
pub enum KeepAlive {
/// If nothing new happens, the connection should be closed at the given `Instant`.
- Until(Instant),+ Until(Instant, Cow<'static, [u8]>),
/// Keep the connection alive.
- Yes,+ Yes(Cow<'static, [u8]>),
/// Close the connection as soon as possible.
No,
}
ProtocolHandlers that delegate to other ProtocolHandlers aggregate the KeepAlive and pass the protocol id of the highest KeepAliveupwards.
Within node_handler.rs I could then log the id of the protocol that keeps the connection alive.
This lead to #1698 which triggered investigation for #1700.
Way forward
First of all: Do people feel the need to surface keep-alive information to the user? Or is the on-demand debugging through log lines good enough?
If we do want to expose that information we need to find a consistent way to do so. One suggestion from my side would be to bubble up a ~ KeepAlive event that records the id of the protocol keeping the protocol alive from the NodeHandler for each connection on a regular interval .
The text was updated successfully, but these errors were encountered:
Background
For paritytech/polkadot#1532 I am investigating why connections initiated through Kademlia are kept alive for more than the expected 10 second idle timeout. As far as I can tell there is no way to learn why connections are kept alive without intrusive libp2p source code changes. Ideally I would like to expose a Prometheus metric in Substrate.
Hacky solution
To help debugging I extended
KeepAlive
with a protocol id indicating the protocol that would like to keep the connection alive.ProtocolHandler
s that delegate to otherProtocolHandler
s aggregate theKeepAlive
and pass the protocol id of the highestKeepAlive
upwards.Within
node_handler.rs
I could then log the id of the protocol that keeps the connection alive.This lead to #1698 which triggered investigation for #1700.
Way forward
First of all: Do people feel the need to surface keep-alive information to the user? Or is the on-demand debugging through log lines good enough?
If we do want to expose that information we need to find a consistent way to do so. One suggestion from my side would be to bubble up a ~
KeepAlive
event that records the id of the protocol keeping the protocol alive from theNodeHandler
for each connection on a regular interval .The text was updated successfully, but these errors were encountered: