feat(metrics): Expose metrics friendly for dashboard #2804

mfornet · 2020-06-06T06:06:09Z

Expose useful metrics in a friendly way to have a useful dashboard for on-fire situations.
As part of this effort worked on the dev-ops to deploy the dashboard. https://github.com/nearprotocol/near-ops/pull/53

There is currently a node on betanet running with this changes applied on top of beta branch.
http://34.94.189.12:3030/status

Testplan

Check that the dashboard is working properly.

DISCLAIMER: While the node is syncing some graphs are not being displayed properly (since some information is not recorded).

Once every node run this code, we are going to be able to select and explore each node individually using the dropdown from the upper left corner. Node will be added automatically as they join to the network.

UPDATE This is the graph after the node finished syncing. More metrics can be displayed in demand:

gitpod-io · 2020-06-06T06:06:13Z

bowenwang1996

Some thoughts:

I suggest that we put all the prometheus stuff behind some flag.
It feels like we are reinventing the wheels here. @frol is there some existing solutions for what's done in named_enum_derive?

bowenwang1996 · 2020-06-06T20:59:07Z

chain/network/src/peer.rs

@@ -605,7 +610,13 @@ impl StreamHandler<Vec<u8>> for Peer {
            self.peer_manager_addr.do_send(metadata);
        }

-        peer_msg.record(msg.len());
+        self.network_metrics


should we put this behind "metric_recorder" or some other flag?

I'm ok putting it behind a flag as stated below, but since I think it should be enabled by default, metric_recorder is not a good flag, since we don't want to enable metric_recorder by default as it consume more resources.

What I am reading here confuses me. It sounds like we have no a very descriptive naming metric_recorder if we don't want to include all the recorded metrics under it.

What happens in practice, is that metric_recorder store too much information, with little aggregation, it have been useful to track down some issue, but we don't really want to put all metrics there, since some of them should be exposed anyway. I can change the name to extra_metrics.

extra_metrics and slow_metrics sound good to me

bowenwang1996 · 2020-06-06T20:59:46Z

core/primitives/src/views.rs

@@ -260,6 +260,8 @@ pub struct StatusResponse {
    pub validators: Vec<ValidatorInfo>,
    /// Sync status of the node.
    pub sync_info: StatusSyncInfo,
+    /// Validator id of the node
+    pub validator_id: Option<AccountId>,


validator_account_id is probably a better name

mfornet · 2020-06-07T04:37:35Z

I suggest that we put all the prometheus stuff behind some flag.

Prometheus metrics are very cheap and the idea was exposing this metrics by default so we can explore this data. I think we should do this at least while we are not on phase 2 so we get better understanding of the current implementation.

For now I can put it behind a feature flag and have it enabled by default.

chain/network/src/peer_manager.rs

tools/named_enum/named_enum/Cargo.toml

frol · 2020-06-09T10:56:28Z

chain/network/src/peer.rs

@@ -605,7 +610,13 @@ impl StreamHandler<Vec<u8>> for Peer {
            self.peer_manager_addr.do_send(metadata);
        }

-        peer_msg.record(msg.len());
+        self.network_metrics


What I am reading here confuses me. It sounds like we have no a very descriptive naming metric_recorder if we don't want to include all the recorded metrics under it.

chain/network/Cargo.toml

chain/network/src/lib.rs

gitpod-io · 2020-06-10T21:20:25Z

Expose validator_id in rpc Use strum (instead of named_enum)

mfornet requested review from frol and bowenwang1996 June 6, 2020 06:06

mfornet requested review from evgenykuzyakov, MaksymZavershynskyi and SkidanovAlex as code owners June 6, 2020 06:06

mfornet mentioned this pull request Jun 6, 2020

Message loss when network rebalances #2756

Closed

bowenwang1996 reviewed Jun 6, 2020

View reviewed changes

frol approved these changes Jun 9, 2020

View reviewed changes

mfornet force-pushed the prometheus_metrics branch from c406443 to c797a81 Compare June 9, 2020 19:05

mfornet requested a review from bowenwang1996 June 9, 2020 19:18

mfornet force-pushed the prometheus_metrics branch from c797a81 to e0b3822 Compare June 9, 2020 19:20

frol reviewed Jun 9, 2020

View reviewed changes

chain/network/Cargo.toml Outdated Show resolved Hide resolved

chain/network/src/lib.rs Outdated Show resolved Hide resolved

bowenwang1996 approved these changes Jun 10, 2020

View reviewed changes

mfornet force-pushed the prometheus_metrics branch from e0b3822 to 83c606c Compare June 10, 2020 18:40

mfornet added the automerge label Jun 10, 2020

mfornet force-pushed the prometheus_metrics branch from 83c606c to de99e19 Compare June 10, 2020 18:56

mfornet force-pushed the prometheus_metrics branch from e232152 to 690bc5d Compare June 11, 2020 03:18

feat: Rework metrics usage. Bulding dashboard

b0a183f

Expose validator_id in rpc Use strum (instead of named_enum)

mfornet force-pushed the prometheus_metrics branch from 690bc5d to b0a183f Compare June 11, 2020 03:30

nearprotocol-bulldozer bot merged commit 953a4de into master Jun 11, 2020

nearprotocol-bulldozer bot deleted the prometheus_metrics branch June 11, 2020 03:38

mfornet mentioned this pull request Jun 11, 2020

useful near_is_validator metric #2839

Closed

mfornet mentioned this pull request Jun 30, 2020

fix: Dependencies updates that lead to timeout #2928

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(metrics): Expose metrics friendly for dashboard #2804

feat(metrics): Expose metrics friendly for dashboard #2804

mfornet commented Jun 6, 2020 •

edited

Loading

gitpod-io bot commented Jun 6, 2020 •

edited

Loading

bowenwang1996 left a comment

bowenwang1996 Jun 6, 2020

mfornet Jun 8, 2020

frol Jun 9, 2020

mfornet Jun 9, 2020

frol Jun 10, 2020

bowenwang1996 Jun 6, 2020

mfornet commented Jun 7, 2020

frol Jun 9, 2020

gitpod-io bot commented Jun 10, 2020 •

edited

Loading

feat(metrics): Expose metrics friendly for dashboard #2804

feat(metrics): Expose metrics friendly for dashboard #2804

Conversation

mfornet commented Jun 6, 2020 • edited Loading

Testplan

gitpod-io bot commented Jun 6, 2020 • edited Loading

bowenwang1996 left a comment

Choose a reason for hiding this comment

bowenwang1996 Jun 6, 2020

Choose a reason for hiding this comment

mfornet Jun 8, 2020

Choose a reason for hiding this comment

frol Jun 9, 2020

Choose a reason for hiding this comment

mfornet Jun 9, 2020

Choose a reason for hiding this comment

frol Jun 10, 2020

Choose a reason for hiding this comment

bowenwang1996 Jun 6, 2020

Choose a reason for hiding this comment

mfornet commented Jun 7, 2020

frol Jun 9, 2020

Choose a reason for hiding this comment

gitpod-io bot commented Jun 10, 2020 • edited Loading

mfornet commented Jun 6, 2020 •

edited

Loading

gitpod-io bot commented Jun 6, 2020 •

edited

Loading

gitpod-io bot commented Jun 10, 2020 •

edited

Loading