Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Instructions for setting up Grafana/Prometheus for monitoring local lotus node #11276

Merged
merged 4 commits into from
Sep 28, 2023

Conversation

fridrik01
Copy link
Contributor

@fridrik01 fridrik01 commented Sep 19, 2023

Related Issues

#10888

Proposed Changes

This PR adds documentation for how to installation and setting up Prometheus, Grafana and node_exporter to get a fully working monitoring system running against a local running lotus node.

Currently the instructions have been tested on Mac M1 and Ubuntu linux.

This PR includes a pre-configured Prometheus configuration as well as an initial dashboard I created for investigating where time is spent in ApplyBlocks which is executed as part of ExecuteTipSet. We should aim to have dashboards targeted for each individual users (miners, rpc providers, core devs, etc) and maintain them inside the Lotus codebase (currently using metrics/grafana as the location for that). I leave that up to future PRs.

Test plan

After following the installation readme, you should get something like (showing the default lotus metric dashboard):
image

And if you setup node_exporter, have rich dashboard for viewing your system metrics:
image

Future work

Although the monitoring system described here should give good overview and analyzis, there are still a lot more things we can do to extend it with more capabilities (especially after we migrate to OpenTelemetry, see #11268):

  • Distributed tracing for critical codepaths
  • Integration with signoz
  • Include logs in traces
  • Alerting
  • etc

Checklist

Before you mark the PR ready for review, please make sure that:

  • Commits have a clear commit message.
  • PR title is in the form of of <PR type>: <area>: <change being made>
    • example: fix: mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, test
    • area, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps
  • If the PR affects users (e.g., new feature, bug fix, system requirements change), update the CHANGELOG.md and add details to the UNRELEASED section.
  • New features have usage guidelines and / or documentation updates in
  • Tests exist for new functionality or change in behavior
  • CI is green

@fridrik01 fridrik01 force-pushed the setup-grafana-prometheus-docs branch 2 times, most recently from dff92be to 3e56383 Compare September 19, 2023 17:17
This PR also includes location where to put our grafana dashboards
which we should maintain in repo.
@rjan90
Copy link
Contributor

rjan90 commented Sep 20, 2023

This is really nice! 👏 I will open a issue in the Lotus-Docs as well, so that we can port this guide over there once it lands in a stable release!

@fridrik01 fridrik01 changed the title [WIP] Instructions for setting up Grafana+Prometheus [WIP] Instructions for setting fully working monitoring system Sep 20, 2023
@fridrik01 fridrik01 marked this pull request as ready for review September 20, 2023 13:55
@fridrik01 fridrik01 requested a review from a team as a code owner September 20, 2023 13:55
@rjan90
Copy link
Contributor

rjan90 commented Sep 20, 2023

Suggested a couple of typo-fixes. The title of the PR can also be updated to align with the PR checklist

@fridrik01 fridrik01 changed the title [WIP] Instructions for setting fully working monitoring system Instructions for setting fully working monitoring system Sep 20, 2023
@fridrik01 fridrik01 changed the title Instructions for setting fully working monitoring system Instructions for setting up fully working monitoring system Sep 20, 2023
@fridrik01 fridrik01 changed the title Instructions for setting up fully working monitoring system Instructions for setting up Grafana/Prometheus monitoring local lotus node Sep 20, 2023
@rjan90
Copy link
Contributor

rjan90 commented Sep 20, 2023

Just went through the tutorial on a Macbook M1 with a devnet, and 🚀
Screenshot 2023-09-20 at 16 31 19

@fridrik01 fridrik01 changed the title Instructions for setting up Grafana/Prometheus monitoring local lotus node Instructions for setting up Grafana/Prometheus for monitoring local lotus node Sep 20, 2023
@jennijuju
Copy link
Member

This is awesome, thank you fridrcik!

@rjan90 rjan90 changed the title Instructions for setting up Grafana/Prometheus for monitoring local lotus node feat: Instructions for setting up Grafana/Prometheus for monitoring local lotus node Sep 21, 2023
Copy link
Contributor

@rjan90 rjan90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A part from the typos, this LGTM!

metrics/README.md Outdated Show resolved Hide resolved
metrics/README.md Outdated Show resolved Hide resolved
metrics/README.md Outdated Show resolved Hide resolved
metrics/README.md Outdated Show resolved Hide resolved
metrics/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@arajasek arajasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working well on Arch Linux too!

@fridrik01 fridrik01 merged commit a791a79 into master Sep 28, 2023
87 checks passed
@fridrik01 fridrik01 deleted the setup-grafana-prometheus-docs branch September 28, 2023 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants