-
Notifications
You must be signed in to change notification settings - Fork 816
Implement RTT measurement #4454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements RTT (Round-Trip Time) measurement for network peers by tracking the time between ping and pong messages. The implementation allows node administrators to monitor network latency to all connected peers, which can help diagnose issues like delayed block acceptance.
Key Changes:
- Added atomic timestamp tracking when ping messages are sent
- Implemented pong message handler to calculate elapsed time and emit RTT metrics
- Added Prometheus counter and gauge metrics for RTT count and sum
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| network/peer/peer.go | Added lastPingSent field, timestamp recording on ping send, and pong handler to calculate RTT |
| network/peer/metrics.go | Added RTTCount and RTTSum Prometheus metrics with registration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This commit implements a Round-Trip-Time measurement by measuring the ping-pong delay. When a ping is sent to a peer `p`, the time is sampled. When a pong is received, the elapsed time is measured and the round_trip_count and round_trip_sum counter and gauge are emitted. Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
This commit adds measurement of round trip time using the ping pong messages.
Why this should be merged
This commit lets the administrators of a node monitor the network round trip time of all peers the network is connected to.
In case the network of the node experiences delay, it will be reflected in this metric and can be used to explain phenomenon such as block acceptance taking too long.
How this works
The Round-Trip-Time is measured by measuring the time between the ping pong messages. When a ping is sent to a peer
p, the time is sampled. When a pong is received, the elapsed time is measured and the round_trip_count and round_trip_sum counter and gauge are emitted.How this was tested
I cherry picked this commit on a Fuji 1.14 node I have deployed a week ago, and re-compiled the node and ran it.
I created in Grafana the dashboard below using the command:
increase(avalanche_network_round_trip_sum[5m]) / increase(avalanche_network_round_trip_count[5m])Need to be documented in RELEASES.md?
No