Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP server duration metrics per connection instead of accumulative #4525

Closed
harel-e opened this issue Oct 27, 2021 · 7 comments
Closed

HTTP server duration metrics per connection instead of accumulative #4525

harel-e opened this issue Oct 27, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@harel-e
Copy link

harel-e commented Oct 27, 2021

Describe the bug

http.server.duration metrics contains net.peer.* attributes
This leads to histogram per connection, instead of accumulative one (or one per host)
Beyond the fact that such histogram is not useful, the metrics exporter kept adding histogram for every client connection that the application made. I quickly got to hundreds of histograms in a sample application.

Steps to reproduce

auto-Instrument java code that makes http requests (I used RestTemplate)

What did you expect to see?

I expect to see one histogram http_server_duration_count for all client connections (or optionally one per http_host destination)

What did you see instead?

I saw histogram buckets per connection

metrics.txt

Prometheus exporter example

http_server_duration_count{http_flavor="1.0",http_host="localhost:8080",http_method="GET",http_scheme="http",http_
server_name="localhost",http_status_code="200",net_peer_ip="0:0:0:0:0:0:0:1",net_peer_name="localhost",net_peer_port="54566",} 1.0
http_server_duration_sum{http_flavor="1.0",http_host="localhost:8080",http_method="GET",http_scheme="http",http_server_name="localhost",http_status_code="200",net_peer_ip="0:0:0:0:0:0:0:1",net_peer_name="localhost",net_peer_port="54566",} 1145.918858
http_server_duration_bucket{http_flavor="1.0",http_host="localhost:8080",http_method="GET",http_scheme="http",http_server_name="localhost",http_status_code="200",net_peer_ip="0:0:0:0:0:0:0:1",net_peer_name="localhost",net_peer_port="54566",le="5.0",} 0.0
... more buckets

http_server_duration_count{http_flavor="1.0",http_host="localhost:8080",http_method="GET",http_scheme="http",http_server_name="localhost",http_status_code="200",net_peer_ip="0:0:0:0:0:0:0:1",net_peer_name="localhost",net_peer_port="54582",} 1.0
... more buckets

http_server_duration_count{http_flavor="1.0",http_host="localhost:8080",http_method="GET",http_scheme="http",http_server_name="localhost",http_status_code="200",net_peer_ip="0:0:0:0:0:0:0:1",net_peer_name="localhost",net_peer_port="54567",} 1.0

What version are you using?

1.7.0

Environment

JDK : OpenJDK 64-Bit Server VM Temurin-11.0.12+7
OS: MacOS 11.6

Additional context

I understand the value of these attributes with tracing, but not with metrics.

@harel-e harel-e added the bug Something isn't working label Oct 27, 2021
@trask
Copy link
Member

trask commented Oct 27, 2021

hi @harel-e! the net.peer.* attributes are recommended for http duration metrics by the specification: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/http-metrics.md#attributes

You should be able to customize the captured metric attributes by implementing a javaagent extension and using SdkMeterProviderBuilder to register a metrics View with an AttributesProcessor that removes attributes that don't make sense in your environment.

@trask
Copy link
Member

trask commented Oct 28, 2021

hey @harel-e, we discussed this today in Java SIG meeting, and agree that the net.peer.* really shouldn't be part of the default metric dimensions. we're going to update the instrumentation and then take this back to the spec. thx for raising it!

@trask
Copy link
Member

trask commented Oct 29, 2021

After another discussion, there does seem to be value (and less harm) in capturing net.peer.* for client duration, but they definitely should not be captured for server duration. Which happens to be exactly what the spec says, and I just missed the Type column in the specification when implementing this. We will fix for the next release.

@harel-e
Copy link
Author

harel-e commented Oct 29, 2021

@trask Thank you for the update. I understand that in the next release the http.server.duration histogram would not include the net.peer.*
Are you saying that net.peer.* would be included in http.client.duration histogram?
If that's the case, it could lead to the same issue I was describing, and a very common one in a distributed environment, where one service (client) is making HTTP call to another service.

@trask
Copy link
Member

trask commented Nov 1, 2021

cc: @jsuereth

@jsuereth
Copy link
Contributor

jsuereth commented Nov 3, 2021

PTAL at #4556 and weight in

@mateuszrzeszutek
Copy link
Member

net.peer.* attributes are no longer used in HTTP server metrics - fixed by #4556

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants