You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
child_count is currently initialized to 0 in Span and never incremented. The OTLP spec marks child_count as optional, but if it is provided, it should represent the count of child spans created for a span. We're in violation of the spec by providing it, but always setting to 0.
child_count is problematic in general, because we may not know the count of (immediate) child spans by the time we finish a span, and because we can create a child span without having a direct reference to its parent. Examples:
Enqueue a job in Resque. Optimistically assume it will be run exactly once, so increment the child_count of the producer span. Job runs, fails, produces a child span, is retried and succeeds, producing a second child span.
Regardless, we can try to count correctly when we have a reference to the unfinished parent. Firstly, we need a way to notify the parent span, e.g. parent_span.increment_child_count. Secondly, we need to increment at the appropriate places. We only want to do this when span.recording? and parent_span.recording?, or when we can reasonably assume a child span will be created in a remote process:
In Tracer#start_span we need to propagate with_parent to internal_create_span, and in internal_create_span in the branch creating a SDK Span, we need to parent_span.increment_child_count if parent_span&.recording?.
In each adapter that assumes creation of a remote child span, when calling e.g. formatter.inject(span.context, request.headers)also call span.increment_child_count. This one is 💩 from an API perspective, since it requires adding #increment_child_count to the public API for Span, and it's kinda hard to use and explain. It does mean we can avoid the if parent_span.recording? check in the previous case. It might get more complicated if we want to assume remote processes respect our sampling decision.
The text was updated successfully, but these errors were encountered:
Related conversations are now indicating that only local (which I take to mean in process) spans should count towards this metric open-telemetry/opentelemetry-proto#72 maybe someone else is able to confirm if that is the case, I don't see a change to the spec itself to reflect that
child_count
is currently initialized to 0 inSpan
and never incremented. The OTLP spec markschild_count
as optional, but if it is provided, it should represent the count of child spans created for a span. We're in violation of the spec by providing it, but always setting to 0.child_count
is problematic in general, because we may not know the count of (immediate) child spans by the time we finish a span, and because we can create a child span without having a direct reference to its parent. Examples:tracer.start_span('child', with_parent_context: context)
.child_count
of theproducer
span. Job runs, fails, produces a child span, is retried and succeeds, producing a second child span.Regardless, we can try to count correctly when we have a reference to the unfinished parent. Firstly, we need a way to notify the parent span, e.g.
parent_span.increment_child_count
. Secondly, we need to increment at the appropriate places. We only want to do this whenspan.recording?
andparent_span.recording?
, or when we can reasonably assume a child span will be created in a remote process:Tracer#start_span
we need to propagatewith_parent
tointernal_create_span
, and ininternal_create_span
in the branch creating a SDKSpan
, we need toparent_span.increment_child_count if parent_span&.recording?
.formatter.inject(span.context, request.headers)
also callspan.increment_child_count
. This one is 💩 from an API perspective, since it requires adding#increment_child_count
to the public API forSpan
, and it's kinda hard to use and explain. It does mean we can avoid theif parent_span.recording?
check in the previous case. It might get more complicated if we want to assume remote processes respect our sampling decision.The text was updated successfully, but these errors were encountered: