Skip to content

Add Support for Prometheus Metrics #8130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
checketts opened this issue Jan 29, 2017 · 11 comments
Closed

Add Support for Prometheus Metrics #8130

checketts opened this issue Jan 29, 2017 · 11 comments
Labels
status: duplicate A duplicate of another issue

Comments

@checketts
Copy link
Contributor

checketts commented Jan 29, 2017

Improve DefaultCacheStatistics and MetricFilter to track metrics that can be useful to Prometheus integrations.

Prometheus takes a different approach with metrics. There are 3 changes that are fundamentally different from other libraries like DropWizard.

Labels (multi-dimensional metrics)

Instead of hierarchical metrics like counter.status.200.metrics and gauge.response.metrics for tracking endpoint requests and latency, Prometheus would expose a metric named like http_requests{status="200", path="metrics"} and http_requests_latency_seconds{path="metrics"}. While both metrics' names contain the same data, aggregation uses methods like sum(counter.status.200.*) to find how many requests are occurring. The Prometheus equivalent (sum(http_requests) by (status)) allows breaking them out by status code very easily and avoiding hierarchical limitations.

Track counters, not rates or ratios

Prometheus also surfaces metrics as raw total and calculates the ratios and rates on the Prometheus server side, via its query language.

This makes metrics collection simpler on the client side (just incrementing a double) and more powerful. Take the current DefaultCacheStatistics as an example. It exposes a caches.<cacheName>.hit.ratio and a .miss.ratio. The Prometheus approach of having metrics like cache_request_total{cache="<cacheName>", result="hit"} keeps the tracking simpler, but allows for queries like rate(cache_request_total[1m]) to see how often the cache is being hit.

Pull Model for metrics collection

DropWizard/statsd approaches push metrics out from a service. Prometheus calls the service and collects the metrics from a service endpoint (similar to the /metrics endpoint supplied by actuator).

Proposal

I'm working on a pull request to add in Prometheus support to Spring boot Actuator. I want to keep the result simple so it can fit in with the existing Spring Boot metrics (and so it will be simple for the Spring Boot team to review and accept).

Right now there is a Prometheus simpleclient_spring_boot that add in an actuator endpoint. Unfortunately any raw copying of Spring Boot PublicMetrics and exposing them as Prometheus metrics causes more noise than useful metrics. I'll iterate with the Prometheus client to see if we can pull out metrics in a useful way, in the meantime I wanted to try to improve the existing interfaces to better expose data so the Prometheus extensions can connect in completely.

On Gitter @dsyer mentioned interest in multi-dimensional metrics with Spring Boot 2. Is that something I could actively contribute to? Should I treat all interfaces declared as unchangeable or would adding methods be allowed (Like the CounterService for example)?

@wilkinsona
Copy link
Member

See also #2949

On Gitter @dsyer mentioned interest in multi-dimensional metrics with Spring Boot 2. Is that something I could actively contribute to?

Yes, please. Contributions in terms of code, analysis, use case descriptions, etc would be most welcome. The above is already a very useful contribution. Thank you.

We plan to start looking at metrics in earnest in 2.0 M2, that'll start around April if things go to plan.

Should I treat all interfaces declared as unchangeable or would adding methods be allowed (Like the CounterService for example)?

Nothing is completely off limits for 2.0. We don't want to break the API for the sake of it, but, on the other hand, 2.0 is a chance to make any breaking changes that are necessary.

@sveniu
Copy link

sveniu commented Feb 22, 2017

This would be really nice. I'm struggling with a large amount of flat metric names like gauge_hystrix_HystrixCommand_myservice_MyClassName_myFunctionName_String_String_int_int__rollingCountShortCircuited, basically named by the full function prototype. These should all really be a single metric rollingCountShortCircuited with various labels/dimensions.

Presumably it's Actuator that does the flattening, having received these from spring-cloud-netflix' Hystrix spring-cloud-starter-hystrix. I'd love to understand how the metrics actually propagate through the various components, so any information there would be appreciated. Hystrix is not alone, the same goes for a large amount of additional metrics originating in the Netflix spring-cloud suite.

@jkschneider
Copy link
Contributor

We are tentatively planning on launching a separate spring-metrics project and ultimately separating this responsibility from Actuator. Among other things, the intent is for spring-metrics to provide a dimensional collector API for which there will be out-of-the-box implementations. I am actually knee deep in Prometheus support right now, as I'm using Prometheus and Spectator as the first two collector support targets!

So rather than thinking of plugging an exporter into Actuator that stuffs hierarchical metrics into Prometheus, you can look for a different metrics collector API that was designed and ships with Prometheus support.

@checketts
Copy link
Contributor Author

@jkschneider That would be ideal. I've found that stuffing hierarchical metrics (and not having useful dimensions) makes it pretty worthless. I'm an active contributor to the Prometheus Java client and would be interested in collaborating on this.

@brian-brazil
Copy link

So rather than thinking of plugging an exporter into Actuator that stuffs hierarchical metrics into Prometheus, you can look for a different metrics collector API that was designed and ships with Prometheus support.

I'd recommend choosing an existing instrumentation library rather than coming up with a new API. There's many subtle and not so subtle design aspects to instrumentation libraries, and thus it's easy to end up with a lowest common denominator that's not great for anyone and is difficult to work with semantically if you try to make an API that suits everyone.

The integration point to other systems would then be at the reporting end rather than the instrumentation end (e.g. what both Prometheus and Dropwizard libraries offer).

@jkschneider
Copy link
Contributor

@brian-brazil Thanks for the feedback. We've definitely had some discussion about this and want to acknowledge the possibility that, as with any abstraction effort, there are just too many differences between the targets to make a useful API and we will ultimately adopt your recommendation.

This has not been my experience so far with Spectator, Prometheus, and Dropwizard, however, so I'm not ready to throw in the towel yet.

I don't want this to be a zero sum game either: an implementation neutral interface could be useful for some (hopefully substantial) set of instrumentation needs, and users can drop down to some particular implementation like Prometheus where it shares no commonality with others.

@brian-brazil
Copy link

It depends on what exactly the resultant API is. For example if you don't accept only float64, then one of the wins of Prometheus (not having to care about unit prefixes in instrumentation) is lost, ultimately causing pain and confusion for everyone.

Do you have an initial API already? I'd hope for something we can map directly into direct instrumentation.

@jkschneider
Copy link
Contributor

@brian-brazil A super-early initial API trying to tease out the differences in the different implementations is here.

You are a hard guy to track down on social media ;) ... I'd suggest we have a more in depth conversation about this 1-1 and distill the results here afterward (jschneider@pivotal.io).

@jkschneider
Copy link
Contributor

The plan to incorporate Micrometer metrics in Boot 2.0 makes Prometheus a supported monitoring system in Boot. See #9970

@Starefossen
Copy link

Starefossen commented Aug 11, 2017 via email

@wilkinsona
Copy link
Member

Closing in favour of #9970

@wilkinsona wilkinsona added status: duplicate A duplicate of another issue and removed priority: normal type: enhancement A general enhancement labels Aug 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: duplicate A duplicate of another issue
Projects
None yet
Development

No branches or pull requests

8 participants