-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(metrics): Expose metrics friendly for dashboard #2804
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts:
- I suggest that we put all the prometheus stuff behind some flag.
- It feels like we are reinventing the wheels here. @frol is there some existing solutions for what's done in
named_enum_derive
?
@@ -605,7 +610,13 @@ impl StreamHandler<Vec<u8>> for Peer { | |||
self.peer_manager_addr.do_send(metadata); | |||
} | |||
|
|||
peer_msg.record(msg.len()); | |||
self.network_metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we put this behind "metric_recorder" or some other flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok putting it behind a flag as stated below, but since I think it should be enabled by default, metric_recorder
is not a good flag, since we don't want to enable metric_recorder
by default as it consume more resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am reading here confuses me. It sounds like we have no a very descriptive naming metric_recorder
if we don't want to include all the recorded metrics under it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in practice, is that metric_recorder store too much information, with little aggregation, it have been useful to track down some issue, but we don't really want to put all metrics there, since some of them should be exposed anyway. I can change the name to extra_metrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra_metrics
and slow_metrics
sound good to me
core/primitives/src/views.rs
Outdated
@@ -260,6 +260,8 @@ pub struct StatusResponse { | |||
pub validators: Vec<ValidatorInfo>, | |||
/// Sync status of the node. | |||
pub sync_info: StatusSyncInfo, | |||
/// Validator id of the node | |||
pub validator_id: Option<AccountId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validator_account_id
is probably a better name
Prometheus metrics are very cheap and the idea was exposing this metrics by default so we can explore this data. I think we should do this at least while we are not on phase 2 so we get better understanding of the current implementation. For now I can put it behind a feature flag and have it enabled by default. |
@@ -605,7 +610,13 @@ impl StreamHandler<Vec<u8>> for Peer { | |||
self.peer_manager_addr.do_send(metadata); | |||
} | |||
|
|||
peer_msg.record(msg.len()); | |||
self.network_metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am reading here confuses me. It sounds like we have no a very descriptive naming metric_recorder
if we don't want to include all the recorded metrics under it.
c406443
to
c797a81
Compare
c797a81
to
e0b3822
Compare
e0b3822
to
83c606c
Compare
83c606c
to
de99e19
Compare
e232152
to
690bc5d
Compare
Expose validator_id in rpc Use strum (instead of named_enum)
690bc5d
to
b0a183f
Compare
Expose useful metrics in a friendly way to have a useful dashboard for on-fire situations.
As part of this effort worked on the dev-ops to deploy the dashboard. https://github.com/nearprotocol/near-ops/pull/53
There is currently a node on betanet running with this changes applied on top of beta branch.
http://34.94.189.12:3030/status
Testplan
Check that the dashboard is working properly.
DISCLAIMER: While the node is syncing some graphs are not being displayed properly (since some information is not recorded).
Once every node run this code, we are going to be able to select and explore each node individually using the dropdown from the upper left corner. Node will be added automatically as they join to the network.
UPDATE This is the graph after the node finished syncing. More metrics can be displayed in demand: