Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics based on telemetry events generated by sparrow #175

Merged
merged 5 commits into from
Jun 24, 2020

Conversation

janciesla8818
Copy link
Contributor

@janciesla8818 janciesla8818 commented Jun 17, 2020

This PR adds new metrics based on sparrow telemetry events. The changes include:

  • Adding histogram, sum and counter metric for HTTP requests
  • Count of worker pools
  • Count of workers per worker pool
  • Total count of workers in all worker pools

Last three metrics are gathered using :wpool.stats to get the information from worker pool.

Telemetry poller is added to the project to collect periodic metrics. Counters of pools and workers are collected as events generated during telemetry poller events. The period for gathering metrics is 5 seconds which is the default value for telemetry poller.

@janciesla8818 janciesla8818 requested review from NelsonVides and michalwski and removed request for NelsonVides June 17, 2020 14:32
@janciesla8818 janciesla8818 force-pushed the sparrow_telemetry_events branch 2 times, most recently from fa2d95f to 182f40f Compare June 22, 2020 08:31
Copy link
Contributor

@michalwski michalwski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It's good to have metrics showing how much time was spent in the HTTP/2 worker. Metrics showing worker pool statistics are also valuable.

On a side note, I think it'd be worth checking if we use the newset versions of the Telemetry releated libraries.

lib/mongoose_push/metrics/telemetry_metrics.ex Outdated Show resolved Hide resolved
2. Send a request.
3. Get a response from push notifications provider.

###### HTTP/2 requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this metric show time spent only on sending the request and waiting for the response?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric shows only the time it takes to handle and send request. It only opens the stream and sends the request. It does not measure response time. In general, when this timer is small it it possible that we are getting errors with opening the stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this. I think it'd be good to add this info to the doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will add that info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalwski, this part is already added to the doc..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder how did you come to the conclusion that it measures only the time it takes to handle and send the request. If I'm correct this metric measures the time of this function in sparrow: https://github.com/esl/sparrow/blob/master/lib/sparrow/h2_worker.ex#L352-L430

Inside the function, we are waiting for the response and emit additional events depending on the response.

Also, I think the name of the metric starts with sparrow_h2_worker instead of sparrow_h_worker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, we discussed this offline with @janciesla8818. I was misled by the post_result var name and the case checking if this is a successful response or not. In fact, this var gives us information if the request was sent successfully or not. The response to the request arrives asynchronously (HTTP/2). Thanks to that we can send many requests using the same h2_worker and connection.

When it comes to the metric name, the Prometheus.Core lib removes all numbers from the metric name.

mix.exs Outdated Show resolved Hide resolved
@janciesla8818 janciesla8818 force-pushed the sparrow_telemetry_events branch 2 times, most recently from 5aa77a1 to d09265d Compare June 23, 2020 15:06
@janciesla8818 janciesla8818 force-pushed the sparrow_telemetry_events branch from d09265d to f864311 Compare June 24, 2020 07:42
@michalwski michalwski merged commit ebfc2ed into master Jun 24, 2020
@michalwski michalwski deleted the sparrow_telemetry_events branch June 24, 2020 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants