Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Return measurements to users in HTTP response #1130

Closed
shanson7 opened this issue Nov 7, 2018 · 6 comments
Closed

Return measurements to users in HTTP response #1130

shanson7 opened this issue Nov 7, 2018 · 6 comments

Comments

@shanson7
Copy link
Collaborator

shanson7 commented Nov 7, 2018

Metrictank publishes a lot of metrics around cache hits, timings, etc. but while optimizing / analyzing individual queries, a lot of this gets lost. Jaeger is somewhat helpful but not sufficient for a variety of reasons (request volume, sampling, etc).

It would be nice if there could be a flag to indicate that stats should be returned with the response that can aid in figuring out what was triggered by this request. Some useful stats might be:

  • Time spent resolving the targets into a concrete list of series
  • Time spent fetching the data
  • Cache usage (tank, chunk cache, cassandra) plus timings
  • Pre-run logic (mergeSeries, sorting, etc)
  • Plan run time
  • The number of points pulled in, the number of points returned, the series pulled in.

Most of these stats are already collected for Jaeger and/or publishing as aggregate metrics.

@Dieterbe
Copy link
Contributor

Jaeger is somewhat helpful but not sufficient for a variety of reasons (request volume, sampling, etc).

would it solve the problem if jaeger was sufficiently low overhead (e.g. via sampling or perhaps even disabled by default) and you had a way to make sure a specific request is not being sampled away? e.g. some kind of request flag to forcibly make sure a request gets instrumented via jaeger?

@shanson7
Copy link
Collaborator Author

Part of it is visibility to end users, where our current tracing in Jaeger is far too verbose and not exposed.

These stats are visible in the query inspector in Grafana, so it's very convenient for our users to see these stats.

@Dieterbe
Copy link
Contributor

Dieterbe commented Mar 13, 2019

OK.
let's see if the grafana team has any recommendations in terms of how to expose these stats. cc @daniellee @torkelo
my main concern - as mentioned in the PR - is that graphite responses are an array of series structs/dictionaries. so we can't really add a stats section to the response globally. so perhaps we should do it via http headers
maybe @DanCech as the graphite maintainer has a recommendation as well?

@davkal
Copy link

davkal commented Mar 18, 2019

Prometheus exposes similar stats when a parameter is present: prometheus/prometheus#2408
It would make sense to render those on demand similarly to the query inspector.
Elastic returns some stats as well. The challenge as pointed out above is to come up with a good model that accommodates most.

@bergquist
Copy link

I think at the common Grafana level we should stay simple and have a list of name with durations. Right now I dont think this would be used outside the query inspector.

As for the Graphite response, I would love to use an object instead of an array but I'm guessing we would be the only use of such feature so I think I would prefer headers in this case. The Graphite datasource plugin could then translate those headers to the internal model.

@Dieterbe
Copy link
Contributor

Dieterbe commented Jun 6, 2019

fixed by #1344

@Dieterbe Dieterbe closed this as completed Jun 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants