Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus endpoint sending invalid type for Histograms #740

Closed
amoldavsky opened this issue Aug 1, 2019 · 10 comments
Closed

Prometheus endpoint sending invalid type for Histograms #740

amoldavsky opened this issue Aug 1, 2019 · 10 comments
Assignees
Milestone

Comments

@amoldavsky
Copy link
Contributor

For record of type Histogram the Prometheus endpoint sends quantiles which is not a supported type for Histogram. It seems as if the Seldon Prometheus server is throwing these records away, but our DataDog implementation is trying to make sense of these records and results in broken reports.

here is a log snippet:

# HELP seldon_api_engine_client_requests_seconds Timer of RestTemplate operation
# TYPE seldon_api_engine_client_requests_seconds histogram

seldon_api_engine_client_requests_seconds{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-predictor",predictor_version="v1",status="200",uri="/predict",quantile="0.5",} 0.005767168

seldon_api_engine_client_requests_seconds{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-predictor",predictor_version="v1",status="200",uri="/predict",quantile="0.75",} 0.006029312

seldon_api_engine_client_requests_seconds{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-predictor",predictor_version="v1",status="200",uri="/predict",quantile="0.95",} 0.0065536

seldon_api_engine_client_requests_seconds{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-predictor",predictor_version="v1",status="200",uri="/predict",quantile="0.98",} 0.00786432

seldon_api_engine_client_requests_seconds{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-predictor",predictor_version="v1",status="200",uri="/predict",quantile="0.99",} 0.00917504

seldon_api_engine_client_requests_seconds_bucket{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-predictor",predictor_version="v1",status="200",uri="/predict",le="0.001",} 0.0

seldon_api_engine_client_requests_seconds_bucket{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-

The first five are Histograms but having a type of quantile.

According to the documentation, Histogram only supports three types and quantiles is not one of them:
https://prometheus.io/docs/concepts/metric_types/#histogram

@ukclivecox
Copy link
Contributor

ukclivecox commented Aug 24, 2019

We are using micrometer (see here and here) with the configuration:

management.metrics.web.client.requests-metric-name=seldon.api.engine.client.requests
management.metrics.distribution.percentiles.all=0.5, 0.75, 0.95, 0.98, 0.99

What version of Prometheus are you using? We have had no issues with these metrics in our analytics helm chart using Prometheus and Grafana.

Can you post the error from which component you are seeing?

@ukclivecox ukclivecox added this to the 1.0.x milestone Aug 24, 2019
@markusgay
Copy link

Hello Clive,

Thank you for your response.
You configure Micrometer Monitoring to report Histograms and percentiles in one reply.
The configuration settings management.metrics.distribution.percentiles and management.metrics.distribution.percentiles-histogram shall not be set together
when compliance with Prometheus specification.
The setting management.metrics.distribution.percentiles adds quantile records to the Prometheus histogram type as defined by management.metrics.distribution.percentiles-histogram.
Quantile records are reserved for the Prometheus type summary.
The Prometheus server implementation from prometheus.io ignores any records which cannot be part of the sent metric type.
Other reference implementations like Datadog, NewRelic are handling invalid records as an error and don't process the received metric.

@markusgay
Copy link

Hello,

The previous Micromenter issue micrometer-metrics/micrometer#562 includes the explanation of the difference between Micromenter Histogram Metric and Prometheus Histogram- and Summary Metric Type. If an application wants to be compliant with the Prometheus Metric Type definition, it has to define a timed metric with the method 'publishPercentilesHistogram(true)' for the histogram metric. And it needs to create a separate time metric object with the method 'publishPercentiles(0.5, 0.75, 0.95, 0.98, 0.99)' for the summary metric.

@ukclivecox
Copy link
Contributor

This is our setup at present:

management.metrics.web.server.auto-time-requests=false
management.metrics.web.server.requests-metric-name=seldon.api.engine.server.requests
management.metrics.web.client.requests-metric-name=seldon.api.engine.client.requests
management.metrics.distribution.percentiles.all=0.5, 0.75, 0.95, 0.98, 0.99
management.metrics.distribution.percentiles-histogram.all=true

Happy to discuss how this can be changed.

@markusgay
Copy link

markusgay commented Aug 29, 2019

Your Grafana dashboard file predictions-analytics-dashboard.json uses the function histogram_quantile (e. g. below) to calculate the quantile from the buckets on the server-side. So the application setting 'management.metrics.distribution.percentiles.all=0.5, 0.75, 0.95, 0.98, 0.99' is not needed, which creates quantiles on the application side.

"expr": "histogram_quantile(0.99, sum(rate(seldon_api_engine_client_requests_seconds_bucket{uri=\"/predict\",model_image=~\"$model_image\",predictor_name=~\"$predictor\",predictor_version=~\"$version\",model_name=~\"$model_name\",model_version=~\"$model_version\"}[20s])) by (predictor_name,predictor_version,model_name,model_image,model_version,le))",

If you want to use Prometheus Histogram and Summary metric types in your dashboards you would have to create two separate timed metrics in your application source code. At the moment, this cannot be achieved by using simply application property settings.

@ukclivecox
Copy link
Contributor

Are you saying micrometer does not allow for what you need. Are you able to suggest the correct config for sping/micrometer?

@markusgay
Copy link

markusgay commented Sep 6, 2019

The configuration without the line 'management.metrics.distribution.percentiles.all=0.5, 0.75, 0.95, 0.98, 0.99' is working for us. I also looked at your Grafan dashboard code, and it is using the function histogram_quantile. So removing the configuration setting 'management.metrics.distribution.percentiles.all=0.5, 0.75, 0.95, 0.98, 0.99' should not break your dashboards.

@ukclivecox ukclivecox modified the milestones: 1.0.x, 0.5.x Sep 18, 2019
@ukclivecox ukclivecox modified the milestones: 0.5.x, 1.0.x Oct 31, 2019
@ukclivecox ukclivecox modified the milestones: 1.0, 1.1 Nov 7, 2019
@ukclivecox
Copy link
Contributor

We are moving to Go replacement. Will be available in 1.1

@ukclivecox
Copy link
Contributor

Please reopen if an issue in Go executor @markusgay

@zyxue
Copy link
Contributor

zyxue commented Aug 5, 2020

For record of type Histogram the Prometheus endpoint sends quantiles which is not a supported type for Histogram. It seems as if the Seldon Prometheus server is throwing these records away, but our DataDog implementation is trying to make sense of these records and results in broken reports.

here is a log snippet:

# HELP seldon_api_engine_client_requests_seconds Timer of RestTemplate operation
# TYPE seldon_api_engine_client_requests_seconds histogram

seldon_api_engine_client_requests_seconds{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-predictor",predictor_version="v1",status="200",uri="/predict",quantile="0.5",} 0.005767168

seldon_api_engine_client_requests_seconds{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-predictor",predictor_version="v1",status="200",uri="/predict",quantile="0.75",} 0.006029312

seldon_api_engine_client_requests_seconds{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-predictor",predictor_version="v1",status="200",uri="/predict",quantile="0.95",} 0.0065536

seldon_api_engine_client_requests_seconds{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-predictor",predictor_version="v1",status="200",uri="/predict",quantile="0.98",} 0.00786432

seldon_api_engine_client_requests_seconds{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-predictor",predictor_version="v1",status="200",uri="/predict",quantile="0.99",} 0.00917504

seldon_api_engine_client_requests_seconds_bucket{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-predictor",predictor_version="v1",status="200",uri="/predict",le="0.001",} 0.0

seldon_api_engine_client_requests_seconds_bucket{clientName="localhost",deployment_name="mnist-deployment",method="POST",model_image="seldonio/sk-example-mnist",model_name="sk-mnist-classifier",model_version="0.2",predictor_name="sk-mnist-

The first five are Histograms but having a type of quantile.

According to the documentation, Histogram only supports three types and quantiles is not one of them:
https://prometheus.io/docs/concepts/metric_types/#histogram

I also find this a bit confusing. Do I understand correctly that in principle, those quantile lines (with quantile="something") shouldn't appear in the histogram metric type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants