Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for instrumentation using Prometheus #87

Merged
merged 12 commits into from
May 10, 2021

Conversation

tgxiii
Copy link
Contributor

@tgxiii tgxiii commented May 4, 2021

This adds the framework for gathering metrics using Prometheus, plus two metrics to start with.

Once metrics are gathered by Prometheus, they can be visualized using tools like Grafana.

@swar
Copy link
Owner

swar commented May 5, 2021

Hey so what's the benefit of Prometheus and where can I read up on documentation of it? Do you have usecases that you'd be able to share so I can understand what's going on?

@tgxiii
Copy link
Contributor Author

tgxiii commented May 5, 2021

Sure thing!

Best place to find out about it is their homepage (provided in my link in the description). There's also this Medium post that explains it.

But the TL;DR is that it's a toolkit for monitoring and alerting. When certain events happen, they are logged, then a Prometheus server scrapes these events and stores them. You can then query it to get insights.

On its own, it's not the nicest interface, so it's usually paired with Grafana, an analytics and monitoring visualization tool. Below is the start of my dashboard. I currently have two machines plotting, one with 3 different jobs and another with only 1.

image

At the moment, this only provides two metrics (currently running plots and number of completed plots). I didn't want to make this PR huge and difficult to review, so my intent was just to set up the framework and provide two simple metrics as examples.

On its own, this PR may not add much value, but I was thinking that I'd eventually add all sorts of metrics like:

  • Elapsed time
  • Progress percentage
  • Temp file size
  • Phase completion time
  • Total completion time
  • Copy completion time

This will give users the ability to do three things:

  • Have a single place to monitor multiple plotters
  • Monitor performance and have a record over time
  • Compare performance between different machines
  • Set up alerts (e.g. when number of plots running drops to a certain threshold, hard drive space in destination runs out, or even as simple as the Plot Manager no longer running)

Just to be clear though, all this does is provide the metrics to Prometheus. It's still up to the users to set up their own Prometheus instance, and Grafana or other visualization/alerting tool that hooks into Prometheus.

Please let me know what you think!

@martin-rohla
Copy link

I was just going to ask @swar if he plans to add some kind of support like this! +1 for the usefulness as @tgxiii explains very well. I'm already using Prometheus with some other tools to parse chia logs to get farming dashboard, having something for plotting too would be very nice!

@swar swar changed the base branch from main to development May 10, 2021 04:25
@swar swar changed the base branch from development to prometheus May 10, 2021 04:51
@swar
Copy link
Owner

swar commented May 10, 2021

I'm merging this to a separate branch so I can make some changes prior to merging it into the development branch.

@swar swar merged commit 5ed5ca7 into swar:prometheus May 10, 2021
@swar swar mentioned this pull request May 20, 2021
@element-software
Copy link

Thanks for this @swar - how would I be able to set the prometheus host IP since I have one main machine (running prometheus, grafana, etc.) and others which are harvesters/plotters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants