-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
litep2p: Introduce metrics to reflect libp2p metrics #4681
Comments
github-merge-queue bot
pushed a commit
that referenced
this issue
Jul 3, 2024
This PR exposes the `RandomKademliaStarted` event from the litep2p network backend, and then increments the appropriate metrics. This is part of: #4681. However, it is more of an effort to debug low peer count ### Testing Done - Started a node and fetched queries: `substrate_sub_libp2p_kademlia_random_queries_total` produces results for litep2p backend cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
TomaszWaszczyk
pushed a commit
to TomaszWaszczyk/polkadot-sdk
that referenced
this issue
Jul 7, 2024
This PR exposes the `RandomKademliaStarted` event from the litep2p network backend, and then increments the appropriate metrics. This is part of: paritytech#4681. However, it is more of an effort to debug low peer count ### Testing Done - Started a node and fetched queries: `substrate_sub_libp2p_kademlia_random_queries_total` produces results for litep2p backend cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
github-merge-queue bot
pushed a commit
that referenced
this issue
Jul 22, 2024
This PR improves the metrics reported by litep2p on request-response errors. Discovered while investigating: - #4985 We are experiencing many requests that are `Refused` by litep2p in comparison with libp2p. The metric roughly approximates the sum of other reasons from libp2p. This PR aims to provide more insights. ``` {__name__="substrate_sub_libp2p_requests_out_failure_total", chain="ksmcc3", instance="localhost:9615", job="substrate_node", protocol="/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/sync/2", reason="Remote has closed the substream before answering, thereby signaling that it considers the request as valid, but refused to answer it."} Last *: 3365 Min: 3363 Max: 3365 Mean: 3365 {__name__="substrate_sub_libp2p_requests_out_failure_total", chain="ksmcc3", instance="localhost:9615", job="substrate_node", protocol="/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/beefy/justifications/1", reason="Remote has closed the substream before answering, thereby signaling that it considers the request as valid, but refused to answer it."} Last *: 3461 Min: 3461 Max: 3461 Mean: 3461 ``` Part of: - #4681 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
drskalman
pushed a commit
to w3f/polkadot-sdk
that referenced
this issue
Jul 23, 2024
…ble litep2p metrics (paritytech#4977) This PR extends the metrics exposed by the peerstore with the total number of banned peers. The new metric is exposed under `substrate_sub_libp2p_peerset_num_banned_peers`. To easily extend metrics in the future, the `fn num_known_peers` is removed in favor of `fn status`. While at it, enable the metrics for litep2p: - total number of peers from peerstore (needed to debug memory consumption) - total number of banned peers from peerstore (needed to debug reputation bans and disconnects) Have added a couple of tests to validate that the number of banned peers is exposed properly. Part of: paritytech#4681 ### Testing Done Using [subp2p-explorer](https://github.com/lexnv/subp2p-explorer) have submitted random data on tx protocol. The peer gets banned, the num of banned peers is incremented then the peer is disconnected. cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Dmitry Markin <dmitry@markin.tech>
TarekkMA
pushed a commit
to moonbeam-foundation/polkadot-sdk
that referenced
this issue
Aug 2, 2024
This PR exposes the `RandomKademliaStarted` event from the litep2p network backend, and then increments the appropriate metrics. This is part of: paritytech#4681. However, it is more of an effort to debug low peer count ### Testing Done - Started a node and fetched queries: `substrate_sub_libp2p_kademlia_random_queries_total` produces results for litep2p backend cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
TarekkMA
pushed a commit
to moonbeam-foundation/polkadot-sdk
that referenced
this issue
Aug 2, 2024
…h#5077) This PR improves the metrics reported by litep2p on request-response errors. Discovered while investigating: - paritytech#4985 We are experiencing many requests that are `Refused` by litep2p in comparison with libp2p. The metric roughly approximates the sum of other reasons from libp2p. This PR aims to provide more insights. ``` {__name__="substrate_sub_libp2p_requests_out_failure_total", chain="ksmcc3", instance="localhost:9615", job="substrate_node", protocol="/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/sync/2", reason="Remote has closed the substream before answering, thereby signaling that it considers the request as valid, but refused to answer it."} Last *: 3365 Min: 3363 Max: 3365 Mean: 3365 {__name__="substrate_sub_libp2p_requests_out_failure_total", chain="ksmcc3", instance="localhost:9615", job="substrate_node", protocol="/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/beefy/justifications/1", reason="Remote has closed the substream before answering, thereby signaling that it considers the request as valid, but refused to answer it."} Last *: 3461 Min: 3461 Max: 3461 Mean: 3461 ``` Part of: - paritytech#4681 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
TarekkMA
pushed a commit
to moonbeam-foundation/polkadot-sdk
that referenced
this issue
Aug 2, 2024
…ble litep2p metrics (paritytech#4977) This PR extends the metrics exposed by the peerstore with the total number of banned peers. The new metric is exposed under `substrate_sub_libp2p_peerset_num_banned_peers`. To easily extend metrics in the future, the `fn num_known_peers` is removed in favor of `fn status`. While at it, enable the metrics for litep2p: - total number of peers from peerstore (needed to debug memory consumption) - total number of banned peers from peerstore (needed to debug reputation bans and disconnects) Have added a couple of tests to validate that the number of banned peers is exposed properly. Part of: paritytech#4681 ### Testing Done Using [subp2p-explorer](https://github.com/lexnv/subp2p-explorer) have submitted random data on tx protocol. The peer gets banned, the num of banned peers is incremented then the peer is disconnected. cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Dmitry Markin <dmitry@markin.tech>
1 task
github-merge-queue bot
pushed a commit
that referenced
this issue
Sep 10, 2024
This release introduces several new features, improvements, and fixes to the litep2p library. Key updates include enhanced error handling, configurable connection limits, and a new API for managing public addresses. For a detailed set of changes, see [litep2p changelog](https://github.com/paritytech/litep2p/blob/master/CHANGELOG.md#070---2024-09-05). This PR makes use of: - connection limits to optimize network throughput - better errors that are propagated to substrate metrics - public addresses API to report healthy addresses to the Identify protocol ### Warp sync time improvement Measuring warp sync time is a bit inaccurate since the network is not deterministic and we might end up using faster peers (peers with more resources to handle our requests). However, I did not see warp sync times of 16 minutes, instead, they are roughly stabilized between 8 and 10 minutes. For measuring warp-sync time, I've used [sub-trige-logs](https://github.com/lexnv/sub-triage-logs/?tab=readme-ov-file#warp-time) ### Litep2p Phase | Time -|- Warp | 426.999999919s State | 99.000000555s Total | 526.000000474s ### Libp2p Phase | Time -|- Warp | 731.999999837s State | 71.000000882s Total | 803.000000719s Closes: #4986 ### Low peer count After exposing the `litep2p::public_addresses` interface, we can report to litep2p confirmed external addresses. This should mitigate or at least improve: #4925. Will keep the issue around to confirm this. ### Improved metrics We are one step closer to exposing similar metrics as libp2p: #4681. cc @paritytech/networking ### Next Steps - [x] Use public address interface to confirm addresses to identify protocol --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
mordamax
pushed a commit
to paritytech-stg/polkadot-sdk
that referenced
this issue
Sep 11, 2024
This release introduces several new features, improvements, and fixes to the litep2p library. Key updates include enhanced error handling, configurable connection limits, and a new API for managing public addresses. For a detailed set of changes, see [litep2p changelog](https://github.com/paritytech/litep2p/blob/master/CHANGELOG.md#070---2024-09-05). This PR makes use of: - connection limits to optimize network throughput - better errors that are propagated to substrate metrics - public addresses API to report healthy addresses to the Identify protocol ### Warp sync time improvement Measuring warp sync time is a bit inaccurate since the network is not deterministic and we might end up using faster peers (peers with more resources to handle our requests). However, I did not see warp sync times of 16 minutes, instead, they are roughly stabilized between 8 and 10 minutes. For measuring warp-sync time, I've used [sub-trige-logs](https://github.com/lexnv/sub-triage-logs/?tab=readme-ov-file#warp-time) ### Litep2p Phase | Time -|- Warp | 426.999999919s State | 99.000000555s Total | 526.000000474s ### Libp2p Phase | Time -|- Warp | 731.999999837s State | 71.000000882s Total | 803.000000719s Closes: paritytech#4986 ### Low peer count After exposing the `litep2p::public_addresses` interface, we can report to litep2p confirmed external addresses. This should mitigate or at least improve: paritytech#4925. Will keep the issue around to confirm this. ### Improved metrics We are one step closer to exposing similar metrics as libp2p: paritytech#4681. cc @paritytech/networking ### Next Steps - [x] Use public address interface to confirm addresses to identify protocol --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
lexnv
added a commit
that referenced
this issue
Nov 15, 2024
This release introduces several new features, improvements, and fixes to the litep2p library. Key updates include enhanced error handling, configurable connection limits, and a new API for managing public addresses. For a detailed set of changes, see [litep2p changelog](https://github.com/paritytech/litep2p/blob/master/CHANGELOG.md#070---2024-09-05). This PR makes use of: - connection limits to optimize network throughput - better errors that are propagated to substrate metrics - public addresses API to report healthy addresses to the Identify protocol Measuring warp sync time is a bit inaccurate since the network is not deterministic and we might end up using faster peers (peers with more resources to handle our requests). However, I did not see warp sync times of 16 minutes, instead, they are roughly stabilized between 8 and 10 minutes. For measuring warp-sync time, I've used [sub-trige-logs](https://github.com/lexnv/sub-triage-logs/?tab=readme-ov-file#warp-time) Phase | Time -|- Warp | 426.999999919s State | 99.000000555s Total | 526.000000474s Phase | Time -|- Warp | 731.999999837s State | 71.000000882s Total | 803.000000719s Closes: #4986 After exposing the `litep2p::public_addresses` interface, we can report to litep2p confirmed external addresses. This should mitigate or at least improve: #4925. Will keep the issue around to confirm this. We are one step closer to exposing similar metrics as libp2p: #4681. cc @paritytech/networking - [x] Use public address interface to confirm addresses to identify protocol --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There are metrics currently used for checking the availability of our node, metrics that can trigger alarms for the oncall engineers.
One such example is
incoming_connections_total
, which does not have a correspondent to litep2p:polkadot-sdk/substrate/client/network/src/service.rs
Lines 1772 to 1778 in e664e98
End goals:
cc @paritytech/networking @paritytech/sdk-node
The text was updated successfully, but these errors were encountered: