Skip to content

Conversation

@bryce-anderson
Copy link
Contributor

@bryce-anderson bryce-anderson commented Mar 26, 2025

Motivation:

We need to also consider the request body when tracking signals and setting context.

Modifications:

Properly track the response and account for the NettyHttpServer auto-drain behavior as well.

@bryce-anderson bryce-anderson force-pushed the bl_anderson/ServerHttpLifecycleObserver branch from 74e7a87 to 05114f3 Compare March 26, 2025 20:07
@bryce-anderson bryce-anderson changed the title Bl anderson/server http lifecycle observer WIP: Set context correctly for the server request body Mar 26, 2025
@bryce-anderson bryce-anderson marked this pull request as ready for review April 1, 2025 15:35
@bryce-anderson bryce-anderson changed the title WIP: Set context correctly for the server request body opentelemetry-http: Set context correctly for the server request body Apr 1, 2025
@bryce-anderson bryce-anderson marked this pull request as draft April 1, 2025 17:20
@bryce-anderson bryce-anderson marked this pull request as draft April 4, 2025 23:03
@bryce-anderson
Copy link
Contributor Author

Back to draft: I'm still seeing flakyness on the error pathways associated with the span completion happening before the onExchangeFinally operations. ☹️

request.trailers().set("x-request-trailer", "request-trailer");
request.payloadBody().writeAscii("bar");
client.request(request).toFuture().get();
Thread.sleep(SLEEP_DURATION);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are those other events that may happen on server after response is received by the client? Is there any chance we can latch on some callback (onExchangeFinally or maybe on connection close) instead of using a sleep?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we either wait for a server to close (if that helps with races) or add a filter as the first one in the pipeline to track when it's time to start asserting attributes?

Copy link
Contributor Author

@bryce-anderson bryce-anderson Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we need to give the traces time to materialize and I'm not sure a filter is going to help. In the agent there are methods to the tun of waitAndAssertTraces (here) which will give the traces some grace period to materialize but for whatever reason that doesn't seem to exist for the SDK testing tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly tried to understand what's delaying those events materialization. If it's because server-side request events may still happen after ServerContext is closed, it seems concerning and maybe we need to investigate separately why it happens. But if it's some buffering inside OTEL SDK or testing tools, then it's ok.

@bryce-anderson bryce-anderson force-pushed the bl_anderson/ServerHttpLifecycleObserver branch 2 times, most recently from 39d73d9 to 9d381a3 Compare April 15, 2025 23:37
@bryce-anderson bryce-anderson force-pushed the bl_anderson/ServerHttpLifecycleObserver branch from 9d381a3 to 01f2ada Compare April 16, 2025 16:23
@bryce-anderson
Copy link
Contributor Author

The build of 0d583df (disable offloading of the service call) demonstrates that if we remove offloading we have a stable build. However, this is not a realistic behavior for the majority of services, but refactoring the offloading behavior is something we should defer to another PR.

@bryce-anderson bryce-anderson marked this pull request as ready for review April 16, 2025 23:49
Copy link
Member

@idelpivnitskiy idelpivnitskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only minor comments and questions about Thread.sleep:

import static java.util.Objects.requireNonNull;

class ScopeTracker implements TerminalSignalConsumer {
final class ScopeTracker implements TerminalSignalConsumer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not in this PR, but maybe later we can offer tracker as a public utility for other use-cases

response.cancel(true);
});
// For the HTTP/1.x server, we don't necessarily see the cancellation until we shutdown the server.
Thread.sleep(SLEEP_DURATION);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sleep still required after ServerContext is closed?

request.trailers().set("x-request-trailer", "request-trailer");
request.payloadBody().writeAscii("bar");
client.request(request).toFuture().get();
Thread.sleep(SLEEP_DURATION);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we either wait for a server to close (if that helps with races) or add a filter as the first one in the pipeline to track when it's time to start asserting attributes?

@bryce-anderson
Copy link
Contributor Author

@idelpivnitskiy, wrt sleep(), how much is that worth to you? It's not a long sleep, I think we can afford it, and investigating alternative ways to avoid it seems to me like more effort than they're worth.

@bryce-anderson bryce-anderson merged commit f1e4b36 into apple:main Apr 21, 2025
12 checks passed
@bryce-anderson bryce-anderson deleted the bl_anderson/ServerHttpLifecycleObserver branch April 21, 2025 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants