feat: Instructions for setting up Grafana/Prometheus for monitoring local lotus node #11276
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related Issues
#10888
Proposed Changes
This PR adds documentation for how to installation and setting up
Prometheus
,Grafana
andnode_exporter
to get a fully working monitoring system running against a local running lotus node.Currently the instructions have been tested on Mac M1 and Ubuntu linux.
This PR includes a pre-configured Prometheus configuration as well as an initial dashboard I created for investigating where time is spent in ApplyBlocks which is executed as part of ExecuteTipSet. We should aim to have dashboards targeted for each individual users (miners, rpc providers, core devs, etc) and maintain them inside the Lotus codebase (currently using
metrics/grafana
as the location for that). I leave that up to future PRs.Test plan
After following the installation readme, you should get something like (showing the default lotus metric dashboard):
And if you setup node_exporter, have rich dashboard for viewing your system metrics:
Future work
Although the monitoring system described here should give good overview and analyzis, there are still a lot more things we can do to extend it with more capabilities (especially after we migrate to OpenTelemetry, see #11268):
Checklist
Before you mark the PR ready for review, please make sure that:
<PR type>: <area>: <change being made>
fix: mempool: Introduce a cache for valid signatures
PR type
: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, testarea
, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps