-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protocol upgrade related metrics #5339
Comments
This issue is complementary to near/NEPs#205 and provides a short-term fix. |
With configurable number of epoch for upgrade I think this feature request has even more sense. Especially the second type of metric requested here - blocks (seconds) left before new protocol version is active. That creates clear signal for the node runner, time is running out for an upgrade. |
Giving comment in #5331 (comment) with second type of metric (blocks left for protocol switchover) it would be already too late and NEAR would panic() anyway, even if upgrade is done within last 2 epochs. |
This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. |
Fixed by #7877 |
Which |
I would like to see via Prometheus metrics (
127.0.0.1:3030/metrics
) how NEAR protocol_version is progressing. Not sure can granularity be achieved within the epoch and not on the epoch boundary, but if that would be possible it would be great, like:$ curl -s 127.0.0.1:3030/metrics | grep -e ^near_protocol_upgrade_progress
52
Above is just an example that 52% of validators upgraded to the new protocol version. If upgrade is not happening the metric is missing or 0. Probably missing if no upgrade in progress.
Another metric which would be great is how many blocks (or seconds in the future if epoch is time-based) are left to the cut-off epoch. So for example, after 80% of validators upgraded, and epoch switches:
$ curl -s 127.0.0.1:3030/metrics | grep -e ^near_protocol_upgrade_epoch_left
34207
so above shows that there is 34207 blocks (or 9h30m7s
(9*60*60)+(30*60)+7
) left before protocol upgrade takes effect. If upgrade is not active, metric is missing.Based on both metrics we would like to create alarms. If metric is present with value above X (lets say 23hrs) warning alarm is created, but if only 6 hrs are left, then critical alarm is created and oncall is paged. Lack of metric - no alarm.
The text was updated successfully, but these errors were encountered: