-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: Expose litep2p metrics in an agnostic manner #294
Open
lexnv
wants to merge
32
commits into
master
Choose a base branch
from
lexnv/metrics
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Ideally this should be Into<String>, but that way the we cannot be object safe Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
This was referenced Dec 2, 2024
Closed
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
lexnv
added a commit
that referenced
this pull request
Dec 3, 2024
Similar to #296, there is a possibility of leaking memory in the following edge-case: - T0: Connection is established and outbound substream is initiated with peer - This maps the substream ID to the request bytes information - T1: Connection is closed before the service has a chance to report `TransportEvent::SubstreamOpened` or `TransportEvent::SubstreamOpenFailure` In this case, if we connect and immediately disconnect with a request in flight, we are effectively leaking the request bytes. Detected by: - #294 ### Dashboard - We are leaking ~111 requests over 3 days timespan: <img width="1484" alt="Screenshot 2024-12-03 at 10 41 01" src="https://github.com/user-attachments/assets/f6701017-4add-4aa1-aee1-e1f8d33d54f3"> cc @paritytech/networking Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
lexnv
changed the title
wip: Metrics
metrics: Expose litep2p metrics in an agnostic manner
Dec 3, 2024
lexnv
added a commit
that referenced
this pull request
Dec 3, 2024
This PR fixes a subtle memory leak that can happen in the following edge-case situation: - connection is established and substream outbound is initiated with remote peer - the substream ID is tracked until the substream either completes successfully or fails - the connection is closed soon after, leading to no substream events ever being generated For this edge-cases, we need to remove the tracking of the substream ID when the connection is reported as closed. This has been detected after running a node for more than 2 days with the following generic metrics PR: - #294 Closes: #295 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR optionally exposes litep2p metrics in an agnostic manner.
API Design
Litep2p supports at the moment the following primitives to register and operate on metrics:
Around these primitives, substrate can expose
prometheus
metrics, and if need be update seamlessly to other metric crates (likeprometheus-client
).Metrics Exposed
The exposed metrics inform the user about the state of litep2p components (like kademlia number of store elements, identify negotiating substreams etc) and help developers detect abnormal behavior (like memory leaks / unbounded growth / stalls in some protocols).
Transport Manager
Transport Layer (TCP + Websocket)
Kademlia
Identify / Ping
Request Response Protocol
These metrics are exposed for every req-resp protocol.
Notification Protocol
These metrics are exposed for every notification protocol.
Dashboards
Review notes: Metric traits are define in
src/metrics.rs
, they should give enough background for the metric registration / update that is happening in the rest of the code