-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow workers to report additional prometheus metrics #210
Conversation
0c1ca83
to
14e588e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments / questions inline.
This looks good.
@@ -62,6 +74,7 @@ module Web = struct | |||
Server.respond_string ~status:`OK ~headers ~body () | |||
| `GET, ["pool"; pool; "worker"; worker; "metrics"] -> get_metrics ~sched ~pool ~worker ~source:`Agent | |||
| `GET, ["pool"; pool; "worker"; worker; "host-metrics"] -> get_metrics ~sched ~pool ~worker ~source:`Host | |||
| `GET, ["pool"; pool; "worker"; worker; extra] -> get_metrics ~sched ~pool ~worker ~source:(`Extra extra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this API look like for extra requests? For example I want to request the clarke
stats, would I make a request "/pool/linux-ppc64/worker/ppc-worker-1/clarke" and that gives me a collection of reported clarke stats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's the idea, finally managed to get on a linux machine and take it for a spin. Seems to be working, so for example /pool/linux-x86_64/worker/my-host/clarke
would give Clarke metrics. The main ones we want:
#HELP Clarke_meter_intensity Current carbon intensity in gCO2/kWh
#TYPE Clarke_meter_intensity gauge
Clarke_meter_intensity 0.000000
#HELP Clarke_meter_watts Current power usage measured in watts
#TYPE Clarke_meter_watts gauge
Clarke_meter_watts 100.000000
Arg.opt Arg.(list additional_metric_conv) [] @@ | ||
Arg.info | ||
~doc:"Additional prometheus endpoints to scrape in the form <name>:<uri> \ | ||
presented as a comma separated list." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this would be used to point to other local metrics collectors and re-export those metrics over capnp, and the name:uri would match the metrics
name configured on the scheduler and the URI is where on the localhost to connect (It could reach out to elsewhere given it's a URI).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. We would run Clarke at localhost:9090/metrics
and then add to ocluster-worker --additional-metrics=clarke:http://localhost:9090/metrics
.
Good point re:URI, if this isn't a useful feature we could perhaps limit to a <name>:<port>
mapping and just always combine that into http://localhost:<port>/metrics
?
Managed to test this and seems to be working 👍 |
Thanks, merged as of 88a917c. |
- ocaml-dockerfile + Build and install opam master from source in Windows images. (@MisterDA ocurrent/ocaml-dockerfile#140) + Include the ocaml-beta-repository in the images. (@kit-ty-kate ocurrent/ocaml-dockerfile#132, review by @MisterDA) - ocluster + Custom healthcheck period. (@mtelvers ocurrent/ocluster#214) + Allow workers to report additional prometheus metrics. (@patricoferris ocurrent/ocluster#210) + other minor things.
- ocaml-dockerfile + Build and install opam master from source in Windows images. (@MisterDA ocurrent/ocaml-dockerfile#140) + Include the ocaml-beta-repository in the images. (@kit-ty-kate ocurrent/ocaml-dockerfile#132, review by @MisterDA) - ocluster + Custom healthcheck period. (@mtelvers ocurrent/ocluster#214) + Allow workers to report additional prometheus metrics. (@patricoferris ocurrent/ocluster#210) + other minor things. - ocaml-version + Expose 4.08.1 and 4.14.1 (@MisterDA ocurrent/ocaml-version#60)
- ocaml-dockerfile + Build and install opam master from source in Windows images. (@MisterDA ocurrent/ocaml-dockerfile#140 ocurrent/ocaml-dockerfile#142) + Include the ocaml-beta-repository in the images. (@kit-ty-kate ocurrent/ocaml-dockerfile#132, review by @MisterDA) - ocluster + Custom healthcheck period. (@mtelvers ocurrent/ocluster#214) + Allow workers to report additional prometheus metrics. (@patricoferris ocurrent/ocluster#210) + other minor things. - ocaml-version + Expose 4.08.1 and 4.14.1 (@MisterDA ocurrent/ocaml-version#60) fixup
…uster (0.2.1) CHANGES: - Expose the ocluster-worker library in the ocluster-worker package (@MisterDA @art-w, ocurrent/ocluster#219 ocurrent/ocluster#217 ocurrent/ocluster#151, reviewed by @tmcgilchrist) - Remove corrupted repositories from the cache (@kit-ty-kate ocurrent/ocluster#216, reviewed by @talex5) - Allow workers to report additional prometheus metrics (@patricoferris ocurrent/ocluster#210, reviewed by @tmcgilchrist, @MisterDA) - Smother Cap'n Proto and TLS debug logs (@MisterDA ocurrent/ocluster#213, reviewed by @talex5) - Added command line option to set obuilder health check period (@mtelvers ocurrent/ocluster#214, reviewed by @tmcgilchrist) - Conditionally compile macos user_temp fetcher (@tmcgilchrist ocurrent/ocluster#209, reviewed by @MisterDA, @mtelvers) - Make rsync-mode mandatory when using rsync store (@tmcgilchrist ocurrent/ocluster#202, reviewed by @MisterDA) - Windows service bugfixes (@MisterDA ocurrent/ocluster#200, reviewed by @tmcgilchrist) - Fix build and opam metadata (@MisterDA @tmcgilchrist ocurrent/ocluster#199 ocurrent/ocluster#203)
…uster (0.2.1) CHANGES: - Expose the ocluster-worker library in the ocluster-worker package (@MisterDA @art-w, ocurrent/ocluster#219 ocurrent/ocluster#217 ocurrent/ocluster#151, reviewed by @tmcgilchrist) - Remove corrupted repositories from the cache (@kit-ty-kate ocurrent/ocluster#216, reviewed by @talex5) - Allow workers to report additional prometheus metrics (@patricoferris ocurrent/ocluster#210, reviewed by @tmcgilchrist, @MisterDA) - Smother Cap'n Proto and TLS debug logs (@MisterDA ocurrent/ocluster#213, reviewed by @talex5) - Added command line option to set obuilder health check period (@mtelvers ocurrent/ocluster#214, reviewed by @tmcgilchrist) - Conditionally compile macos user_temp fetcher (@tmcgilchrist ocurrent/ocluster#209, reviewed by @MisterDA, @mtelvers) - Make rsync-mode mandatory when using rsync store (@tmcgilchrist ocurrent/ocluster#202, reviewed by @MisterDA) - Windows service bugfixes (@MisterDA ocurrent/ocluster#200, reviewed by @tmcgilchrist) - Fix build and opam metadata (@MisterDA @tmcgilchrist ocurrent/ocluster#199 ocurrent/ocluster#203)
These are somewhat belated comments from me; sorry about that. Perhaps I'm missing something, but why is there a distinction between |
In order to support workers reporting energy metrics we need to allow them to proxy more prometheus endpoints. I've added an
additional_metric
method to the worker interface to let them do this where we can add an arbitrary number of<name>:<uri>
pairs for the worker to collect similarly to what is currently done by thenode-exporter
.This would allow us to then run clarke on some workers and have them proxy the prometheus metrics, in addition to that we could run other prometheus reporters too.
I'll need to set up some things in order to test this properly but thought I would open the PR now to get some thoughts on the API.