-
Notifications
You must be signed in to change notification settings - Fork 38.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent duplicate HTTP server observations for cancelled exchanges #31417
Labels
in: web
Issues in web modules (web, webmvc, webflux, websocket)
theme: observability
An issue related to observability and tracing
type: bug
A general bug
Milestone
Comments
bclozel
added
in: web
Issues in web modules (web, webmvc, webflux, websocket)
type: bug
A general bug
theme: observability
An issue related to observability and tracing
labels
Oct 12, 2023
This was referenced Oct 12, 2023
bclozel
added a commit
that referenced
this issue
Oct 13, 2023
Since the Spring WebFlux HTTP server instrumentation has been moved from the `WebFilter` to the `HttpWebHandlerAdapter`, we need to apply similar changes there. See gh-31417
This was referenced Nov 28, 2023
bclozel
added a commit
that referenced
this issue
Nov 29, 2023
Prior to this commit, regressions were introduced with gh-31417: 1. the observation keyvalues would be inconsistent with the HTTP response 2. the observation scope would not cover all controller handlers, causing traceIds to be missing The first issue is caused by the fact that in case of error signals, the observation was stopped before the response was fully committed, which means further processing could happen and update the response status. This commit delays the stop event until the response is committed in case of errors. The second problem is caused by the change from a `contextWrite` operator to using the `tap` operator with a `SignalListener`. The observation was started in the `doOnSubscription` callback, which is too late in some cases. If the WebFlux controller handler is synchronous non-blocking, the execution of the handler is performed before the subscription happens. This means that for those handlers, the observation was not started, even if the current observation was present in the reactor context. This commit changes the `doOnSubscription` to `doFirst` to ensure that the observation is started at the right time. Fixes gh-31703 Fixes gh-31706
bclozel
added a commit
that referenced
this issue
Nov 29, 2023
Prior to this commit, regressions were introduced with gh-31417: 1. the observation keyvalues would be inconsistent with the HTTP response 2. the observation scope would not cover all controller handlers, causing traceIds to be missing The first issue is caused by the fact that in case of error signals, the observation was stopped before the response was fully committed, which means further processing could happen and update the response status. This commit delays the stop event until the response is committed in case of errors. The second problem is caused by the change from a `contextWrite` operator to using the `tap` operator with a `SignalListener`. The observation was started in the `doOnSubscription` callback, which is too late in some cases. If the WebFlux controller handler is synchronous non-blocking, the execution of the handler is performed before the subscription happens. This means that for those handlers, the observation was not started, even if the current observation was present in the reactor context. This commit changes the `doOnSubscription` to `doFirst` to ensure that the observation is started at the right time. Fixes gh-31715 Fixes gh-31716
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
in: web
Issues in web modules (web, webmvc, webflux, websocket)
theme: observability
An issue related to observability and tracing
type: bug
A general bug
In some specific cases, HTTP clients can eagerly close connections with the server right after the response has been received, but before the HTTP exchange is considered complete. This can result in duplicate observations being recorded, one with the
"SUCCESSFUL"
outcome and another one with the"UNKNOWN"
outcome (as the exchange was cancelled).This can be reproduced with the following:
"Duplicate metrics" can be explained by:
doOnCancel
can be called afterdoOnTerminate
Observation
API not preventing multipleObservation#stop()
calls; this means observation handlers are called multiple timesWe cannot ignore CANCEL signals in our instrumentation, as this would leak started observations and would not count all valid cases of cancellations. We should instead refine our instrumentation with the
Mono#tap
operator and locally guard against this case.This needs to be applied on the reactive
ServerHttpObservationFilter
and theHttpWebHandlerAdapter
instrumentation that replaces it in Spring Framework 6.1.The text was updated successfully, but these errors were encountered: