Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bloomberg PR: Telemetry improvements #67

Merged
merged 3 commits into from
Aug 18, 2020
Merged

Conversation

lornasong
Copy link
Member

This PR includes the go-metrics telemetry changes from #56. A separate PR (#63) contains the non-telemetry changes.

#56 changes were merged into a non-master branch bloomberg-dev in order to help carry the PR, add some additional commits to Bloomberg’s work, and retain the Corporate CLA.

@hashicorp-cla
Copy link

hashicorp-cla commented Jun 10, 2020

CLA assistant check
All committers have signed the CLA.

@lornasong
Copy link
Member Author

Right now, we’re working to understand how best to move forward with telemetry. Below captures some points for consideration:

@lornasong
Copy link
Member Author

@mikeyyuen, I spoke with a team member, would you be able to tell us more about your team’s telemetry use case and what features you all use? We’d like to understand if there is any way that OpenTelemetry supports your use case. For example, OpenTelemetry doesn’t support statsd but does support dogstatsd. However, it doesn’t support global tags for dogstatsd, which is implemented in your PR.

@findkim
Copy link
Contributor

findkim commented Jun 22, 2020

Hey @mikeyyuen our main concern between statsd and dogstatsd is the difference in tag syntax. I believe a common Statsd spec is ; delimiter and = for key values, whereas dogstatsd uses , and : respectively. From my testing, dogstatsd tags would cause parsing issues when using metric aggregators or visualization tools that expect statsd tags. Though the base of the syntax is the same between the two.

Statsd example

check.txn:1|c|#tag1=value;#tag2=value2
checks.update:203|h|#tag1=value;#tag2=value2

Dogstatsd example

check.txn:1|c|#tag1:value,#tag2:value2
checks.update:203|h|#tag1:value,#tag2:value2

Looking at the metrics changes from the PR, there are no tags added to the value themselves and only uses the dot delimiter. Omitting the support for global tags, my take is that our transition from the go-metrics library to OpenTelemetry would be satisfy the basic statsd spec. Because w/o tags, there effectively is no difference between statsd and dogstatsd, and users could configure the statsd sink using the telemetry.dogstatsd block.

telemetry {
  dogstatsd {
    address = "udp://127.0.0.1:8125"
  }
}

Does this slight deviation from your proposed Telemetry changes work for you?

@mikeyyuen
Copy link
Contributor

@findkim, yes, I think dogstatsd would work well for us and be a good solution for most statsd users. Thank you for doing that research.

@findkim findkim mentioned this pull request Jun 26, 2020
This reverts commit 0fed622.

0fed622 was done in order to review and merge Bloomberg’s non-telemetry changes from telemetry changes. Now that the non-telemetry changes are merged into master, in order to view the telemetry changes from the original branch, we need to unrevert the telemetry changes.
@lornasong
Copy link
Member Author

lornasong commented Aug 13, 2020

Hi @mikeyyuen - thanks again for these telemetry improvements! I'm sure the community will be excited to start using them. I made a few minor changes to the original Bloomberg, namely:

  • Refactoring the telemetry config logic into convertTelemetry, for unit testing purposes
  • Additional configuration unit tests (some unrelated to Telemetry)
  • Updated readme
  • Minor comment updates

EDIT: I also updated the version of go-metrics to v0.3.4

Don't hesitate to let me know if you have any questions or concerns. I'll leave this PR open for the next couple days. On Tuesday, I'll request my team to review.

Thanks!

@lornasong lornasong marked this pull request as ready for review August 13, 2020 21:24
@mikeyyuen
Copy link
Contributor

Thanks @lornasong Your changes look good, we're excited to have them!

@lornasong lornasong requested a review from a team August 18, 2020 14:57
@lornasong lornasong merged commit b9182e0 into master Aug 18, 2020
@lornasong lornasong deleted the bloomberg-telemetry branch August 18, 2020 18:19
@lornasong lornasong added this to the 0.5.0 milestone Aug 18, 2020
@edevil
Copy link
Contributor

edevil commented Oct 6, 2020

I was looking at this feature and was wondering how the Prometheus support worked. The config mentions the Consul docs, but Consul has an http endpoint where these metrics can be scrapped. The only way I could find of obtaining the metrics, without sending them to a remote service, was to use SIGUSR1:

[2020-10-06 17:24:40 +0100 WEST][G] 'runtime.total_gc_pause_ns': 155660.000
[2020-10-06 17:24:40 +0100 WEST][G] 'runtime.total_gc_runs': 2.000
[2020-10-06 17:24:40 +0100 WEST][G] 'runtime.num_goroutines': 42.000
[2020-10-06 17:24:40 +0100 WEST][G] 'runtime.alloc_bytes': 2507400.000
[2020-10-06 17:24:40 +0100 WEST][G] 'runtime.sys_bytes': 76104704.000
[2020-10-06 17:24:40 +0100 WEST][G] 'runtime.malloc_count': 75696.000
[2020-10-06 17:24:40 +0100 WEST][G] 'runtime.free_count': 60111.000
[2020-10-06 17:24:40 +0100 WEST][G] 'runtime.heap_objects': 15585.000
[2020-10-06 17:24:40 +0100 WEST][C] 'check.txn': Count: 1 Sum: 1.000 LastUpdated: 2020-10-06 17:24:42.17699 +0100 WEST m=+163.888674272
[2020-10-06 17:24:40 +0100 WEST][S] 'checks.update': Count: 1 Sum: 0.023 LastUpdated: 2020-10-06 17:24:42.175552 +0100 WEST m=+163.887236370

How can Prometheus scrape ESM's metrics?

@lornasong
Copy link
Member Author

@edevil - apologies for the delay. Using SIGUSR1 is definitely one way to go. I'm not familiar with Prometheus and started looking into it. It looks like we are adding a Prometheus sink when initializing telemetry. I'll continue to look into it but wanted to reach out before too much time had passed. Please feel free to let me know if you have any updates. Thank you!

@edevil
Copy link
Contributor

edevil commented Oct 14, 2020

Hello @lornasong . So prometheus metrics usually are scraped from the application through an http interface. The consul agent, which already has an http interface, exposes the metrics here https://github.com/hashicorp/consul/blob/1b413b0444fe91a30bf18989fe8c668a767c9c8a/agent/agent_endpoint.go#L151.

In the case of ESM, it currently has no HTTP interface, so I see no way to scrape the prometheus metrics that are being collected. Would it be OK to expose an http endpoint just for metrics?

@lornasong
Copy link
Member Author

@edevil, thanks for the additional information on Prometheus! I’m also concluding that a metrics endpoint is needed for Prometheus metrics. I’d like to do a little more research to make sure I understand what go-metrics is doing by adding the Prometheus sink.

Regardless, I think it would be good to open a new issue to capture this feature, and we can continue the conversation there. Since you’ve done all this great research, would you want to be the one to open the issue and include the details you’ve mentioned in the past two comments? If you prefer, I’m also happy to open the issue - let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants