SDK: Increment child_count in Span when creating child Spans #150

fbogsany · 2019-11-12T17:25:11Z

child_count is currently initialized to 0 in Span and never incremented. The OTLP spec marks child_count as optional, but if it is provided, it should represent the count of child spans created for a span. We're in violation of the spec by providing it, but always setting to 0.

child_count is problematic in general, because we may not know the count of (immediate) child spans by the time we finish a span, and because we can create a child span without having a direct reference to its parent. Examples:

Call tracer.start_span('child', with_parent_context: context).
Enqueue a job in Resque. Optimistically assume it will be run exactly once, so increment the child_count of the producer span. Job runs, fails, produces a child span, is retried and succeeds, producing a second child span.

Regardless, we can try to count correctly when we have a reference to the unfinished parent. Firstly, we need a way to notify the parent span, e.g. parent_span.increment_child_count. Secondly, we need to increment at the appropriate places. We only want to do this when span.recording? and parent_span.recording?, or when we can reasonably assume a child span will be created in a remote process:

In Tracer#start_span we need to propagate with_parent to internal_create_span, and in internal_create_span in the branch creating a SDK Span, we need to parent_span.increment_child_count if parent_span&.recording?.
In each adapter that assumes creation of a remote child span, when calling e.g. formatter.inject(span.context, request.headers) also call span.increment_child_count. This one is 💩 from an API perspective, since it requires adding #increment_child_count to the public API for Span, and it's kinda hard to use and explain. It does mean we can avoid the if parent_span.recording? check in the previous case. It might get more complicated if we want to assume remote processes respect our sampling decision.

The text was updated successfully, but these errors were encountered:

fbogsany · 2019-11-15T17:02:07Z

Relevant spec issue: open-telemetry/opentelemetry-specification#355

benedictfischer09 · 2020-01-08T03:52:04Z

Related conversations are now indicating that only local (which I take to mean in process) spans should count towards this metric open-telemetry/opentelemetry-proto#72 maybe someone else is able to confirm if that is the case, I don't see a change to the spec itself to reflect that

mwear · 2020-02-27T01:03:52Z

It looks like we don't need this and can probably remove it altogether: open-telemetry/opentelemetry-proto#107

fbogsany added the bug Something isn't working label Nov 12, 2019

fbogsany mentioned this issue Nov 12, 2019

SDK: integration tests #143

Merged

fbogsany added this to the Beta v0.7 milestone Sep 18, 2020

fbogsany linked a pull request Sep 18, 2020 that will close this issue

fix! remove child_count #400

Merged

fbogsany closed this as completed in #400 Sep 18, 2020

fbogsany modified the milestones: Beta v0.8, Beta v0.7 Oct 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK: Increment child_count in Span when creating child Spans #150

SDK: Increment child_count in Span when creating child Spans #150

fbogsany commented Nov 12, 2019

fbogsany commented Nov 15, 2019

benedictfischer09 commented Jan 8, 2020

mwear commented Feb 27, 2020

SDK: Increment child_count in Span when creating child Spans #150

SDK: Increment child_count in Span when creating child Spans #150

Comments

fbogsany commented Nov 12, 2019

fbogsany commented Nov 15, 2019

benedictfischer09 commented Jan 8, 2020

mwear commented Feb 27, 2020