Use server-timing for trace context response #560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

dyladan wants to merge 2 commits into w3c:main from dynatrace-oss-contrib:server-timing-response

Member

dyladan commented Feb 27, 2024 •

edited by pr-preview bot

Loading

This is a first draft of the server-timing response header. It is not meant to be a final version, but a place to start discussion. It is mostly a direct translation with the following exceptions:

Version is optional

In the original version, all fields were required in order to simplify parsing. Now that we are a small component of a more complex header, that simplicity is lost, and the parsing simplification is no longer beneficial.

Flags are optional

The reasoning is the same as above. Additionally, some servers may not want to reveal information like the sampled flag to an untrusted client in order to avoid abuse.

Terminology Fixes

Changed ambiguous or nonstandard terms like "callee" and "caller" to established terms like "server" and "client."


          Use server-timing for trace context response

bbf648c

SergeyKanzhelev mentioned this pull request

Server-timing names #561

Open

dmathieu reviewed

View reviewed changes

spec/21-http_response_header_format.md


		### traceresponse Header Field Values
		Metric name: `trace`

dmathieu Feb 28, 2024

As someone who hasn't attended the meetings (where I suppose this was discussed), this is a bit obscure. What does this name mean? Here, it's trace, but should it always be that? What would be other good values?

Member

SergeyKanzhelev Feb 28, 2024

server-timing is a set of metrics. This is one of them and we are trying to reserve the name trace.

Alternative naming proposal is here: #561

basti1302 reviewed

View reviewed changes

spec/21-http_response_header_format.md Show resolved Hide resolved


          Use desc param of server timing metric

ca90939

johnbley reviewed

View reviewed changes

spec/21-http_response_header_format.md

    
              This section describes the binding of the distributed trace context to the `traceresponse` HTTP header.

              This section describes the binding of the distributed trace context to a metric in the Server Timing HTTP header.

johnbley Apr 25, 2025

I'd suggest HTTP response header (adding "response") to be super-clear.

spec/21-http_response_header_format.md

               #### child-id
-              This is the ID of the operation of the callee (in some tracing systems, this is known as the `span-id`, where a `span` is the execution of a client request) and is used to uniquely identify an operation within a trace. It is represented as an 8-byte array, for example, `00f067aa0ba902b7`. All bytes as zero (`0000000000000000`) is considered an invalid value.
+              This is the span ID of the server operation. It is represented as an 8-byte array, for example, `00f067aa0ba902b7`. An all-zero child ID (`0000000000000000`) is an invalid value. Tracing systems MUST ignore the trace context metric when the child id is invalid (for example, if it contains non-lowercase hex characters).

johnbley Apr 25, 2025

I don't normally like to bikeshed names too much, but to me this descriptive text suggests that the name of this field should be changed to span-id. It would read much more cleanly and sidestep any questions of semantics ("child of.... what?")

spec/21-http_response_header_format.md

-. An untrusted callee may be able to abuse a tracing system by setting these flags maliciously.
-. A callee may have a bug which causes the tracing system to have a problem.
-. Different load between calling and called services might force one or more participants to discard part or all of a trace.
+. An untrusted server may be able to abuse a tracing system by setting these flags maliciously.

johnbley Apr 25, 2025

I like these examples of thinking through security implications. I might add that simply exposing the trace id (and span id) itself to clients may present a risk in some scenarios. Do we need text explicitly stating that using the response header (server-timing: trace) is optional for all participants and/or that compliant software SHOULD make emitting it configurable?

spec/21-http_response_header_format.md

    
              - If a component deferred or delayed the decision and only a subset of telemetry will be recorded, the `sampled` flag from the incoming `traceparent` header should be used if it is available. It should be set to `0` as the default option when the trace is initiated by this component.

              - If a component receives a `0` for the `sampled` flag on an incoming request, it may still decide to record a trace. In this case it SHOULD return a `sampled` flag `1` on the response so that the caller can update its sampling decision if required.

              - If the server deferred or delayed the decision and only a subset of telemetry will be recorded, the `sampled` flag from the incoming `traceparent` header should be used if it is available. It should be set to `0` as the default option when the trace is initiated by this server.

              - If the server receives a `0` for the `sampled` flag on an incoming request, it may still decide to record a trace. In this case it SHOULD return a `sampled` flag `1` on the response so that the client can update its sampling decision if required.

johnbley Apr 25, 2025

I like that this embodies the "I'm going to tell you, the client, how I actually traced this, insofar as I know it" spirit.

spec/21-http_response_header_format.md

    
              A participant that continues a trace started downstream &mdash; that is, if the participant uses the `trace-id` value from a `traceresponse` header it has received &mdash; MUST set the `random-trace-id` flag in its own `traceresponse` header to the same value that was found in the `traceresponse` header from which the `trace-id` was taken.

              A participant that continues a trace started downstream &mdash; that is, if the participant uses the `trace-id` value from a trace context server timing metric it has received &mdash; MUST set the `random-trace-id` flag in its own trace context server timing metric to the same value that was found in the trace context server timing metric from which the `trace-id` was taken.

johnbley Apr 25, 2025 •

edited

Loading

Perhaps a Client Interpretation section on how participating clients may choose to "extend" the trace by producing data with the same trace-id (as some browser instrumentation might do for page load), may choose to link the two traces in a system-appropriate way (e.g., as OTel span links or as simple Zipkin tags like linked.traceId), may choose to ignore it, or may choose to log or otherwise respond when its intended trace propagation was not honored ("I sent you a traceparent but your response shows you started a new trace")?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

dmathieu dmathieu left review comments

basti1302 basti1302 left review comments

johnbley johnbley left review comments

SergeyKanzhelev SergeyKanzhelev left review comments

At least 2 approving reviews are required to merge this pull request.

Labels

None yet