Create SDK TRACE context, a shared reference to tracer pipelines between Tracer/TracerProvider. Update tests. #590

jsuereth · 2021-02-26T14:55:23Z

Precursor to #502

Changes

Create TracerContext for storing Sampler, Resource and SpanProcessor.
Update Tracer, TracerProvider to use shared reference to TracerContext
Update Span to no longer store two shared pointers to the same content (Tracer + Processor). Instead Span now pulls Processor from Tracer. This is preparation for pulling InsturmentationLibrary
TracerProvider now uniquely owns a SpanProcessor (to ensure appropriate lifecycle of flush/shutdown)
Update zpages extension to share data through a specific object + shared_ptr rather than directly sharing a SpanProcessor instance.
Update SpanProcessor to take in a sdk::trace::Span reference instead of Recordable directly. Note: Span already remembers the Tracer it came from, which (will) include InstrumentationLibrary and TracerContext

codecov · 2021-02-26T16:34:27Z

Codecov Report

Merging #590 (51a48d5) into main (3d71e5e) will decrease coverage by 0.01%.
The diff coverage is 87.17%.

❗ Current head 51a48d5 differs from pull request most recent head 0356183. Consider uploading reports for the commit 0356183 to get more accurate results

@@            Coverage Diff             @@
##             main     #590      +/-   ##
==========================================
- Coverage   94.40%   94.38%   -0.02%     
==========================================
  Files         191      192       +1     
  Lines        9023     9036      +13     
==========================================
+ Hits         8518     8529      +11     
- Misses        505      507       +2

Impacted Files	Coverage Δ
api/include/opentelemetry/trace/span_context.h	`100.00% <ø> (ø)`
api/include/opentelemetry/trace/trace_state.h	`97.54% <ø> (-0.02%)`	⬇️
sdk/src/trace/span.h	`100.00% <ø> (ø)`
sdk/src/trace/tracer_context.cc	`61.53% <61.53%> (ø)`
sdk/src/trace/tracer_provider.cc	`80.95% <75.00%> (+12.77%)`	⬆️
sdk/src/trace/tracer.cc	`71.87% <80.00%> (-2.42%)`	⬇️
ext/test/zpages/tracez_data_aggregator_test.cc	`96.35% <100.00%> (+0.01%)`	⬆️
ext/test/zpages/tracez_processor_test.cc	`98.70% <100.00%> (+<0.01%)`	⬆️
...clude/opentelemetry/sdk/common/atomic_unique_ptr.h	`100.00% <100.00%> (ø)`
sdk/src/trace/span.cc	`89.47% <100.00%> (ø)`
... and 3 more

sdk/include/opentelemetry/sdk/trace/tracer_context.h

pyohannes · 2021-03-01T19:48:23Z

sdk/include/opentelemetry/sdk/trace/tracer_context.h

+   */
+  const opentelemetry::sdk::resource::Resource &GetResource() const noexcept;
+
+  void ForceFlushWithMicroseconds(uint64_t timeout) noexcept;


Do we need this?

We need a way to force-flush spans in some ephemeral compute environments. There's been SOME additions to the spec around ForceFlush. I think we need this to support that version of the spec.

sdk/include/opentelemetry/sdk/trace/tracer_context.h

…inux.

…ce API in tests rather than recordable.

seemk · 2021-03-13T18:09:33Z

TracerProvider now uniquely owns a SpanProcessor (to ensure appropriate lifecycle of flush/shutdown)

Does this mean in the case of multiple tracer providers there will be multiple span processors and thus multiple exporters, each processor having it's own queue to flush? In the case of nginx I'll most likely need several Resource to specify service names for different virtual hosts / domains (or arbitrary route based service names) and in order to achieve that I need multiple tracer providers. The multiple processors/queues most likely won't become an issue, but each batch span processor does create its own thread so an nginx with multiple virtual domains might end up with a lot of threads created, which is a bit concerning, but might not be a problem in reality.

lalitb · 2021-03-15T16:58:40Z

sdk/include/opentelemetry/sdk/trace/span.h

+   *
+   * TODO(jsuereth): This method will be reworked once multi-processor span support is added.
+   */
+  std::unique_ptr<Recordable> ConsumeRecordable() {


@jsuereth - This is something I am not able to understand. How will this method modified/replaced to get the correct Recordable to use from span object once we have multi-processor span support.

Sorry for long delay. I'm ~60% through adding multi-processor support to show this (and prove it works). You're right I had to remove this method.

lalitb · 2021-03-15T17:24:37Z

but might not be a problem in reality.

Have observed typical web-hosting companies creating tens of thousands of (dynamic/static) vhost configurations to be handled by single nginx/apache servers :)

jsuereth · 2021-03-22T17:32:40Z

@seemk From a Resource standpoint, Nginx would be the service providing telemetry. I'm not sure we've specified (well) how to handle virtual domain handling, etc., but in this case I think the "spirit" of resource is that you're reporting for the NGinx module hosting all of those domains. You'd use labels to identify which virtual host each trace is associated with. This likely is a bigger discussion than what fits on this CL.

Additionally, regarding thread pool and resource constraints, I agree that batch-span-processor consuming a thread could be an issue. I think that's worth opening a bug against. We might need to adapt BatchProcessor to use some kind of shared-thread / event-queue model where you can limit # of threads used by the entire SDK. I don't think we should try to limit # of TracerProviders or force 1-batch-span-processor today as that's antithetical to some of the design ideas. I do agree current incarnation of BSP needs some work.

Again apologies for my slow progress :) Looking forward to chatting over things more in our next SiG. I should have a working multi-processor implementation by then, and we can figure out where to go next, given current state + spec.

lalitb · 2021-03-22T18:10:37Z

. We might need to adapt BatchProcessor to use some kind of shared-thread / event-queue model where you can limit # of threads used by the entire SDK.

There was some work done initially to support async-io using event-loop (#63) - not sure if we can use this to get rid of thread-model for batchprocessor.

seemk · 2021-03-23T10:48:57Z

@seemk From a Resource standpoint, Nginx would be the service providing telemetry. I'm not sure we've specified (well) how to handle virtual domain handling, etc., but in this case I think the "spirit" of resource is that you're reporting for the NGinx module hosting all of those domains. You'd use labels to identify which virtual host each trace is associated with. This likely is a bigger discussion than what fits on this CL.

Additionally, regarding thread pool and resource constraints, I agree that batch-span-processor consuming a thread could be an issue. I think that's worth opening a bug against. We might need to adapt BatchProcessor to use some kind of shared-thread / event-queue model where you can limit # of threads used by the entire SDK. I don't think we should try to limit # of TracerProviders or force 1-batch-span-processor today as that's antithetical to some of the design ideas. I do agree current incarnation of BSP needs some work.

Again apologies for my slow progress :) Looking forward to chatting over things more in our next SiG. I should have a working multi-processor implementation by then, and we can figure out where to go next, given current state + spec.

Thanks! Yeah, I wasn't sure myself how the virtual domain handling should happen inside nginx and whether creating multiple resources would be the way to go. I do like your idea of a single resource with custom attributes. In any case I guess the multiple threads won't be an issue for now 🙂

- Migrate SpanProcessor to use `ExportableSpan` - `ExportableSpan` has a registry of `Recordable`s denoated from processors - `ExportableSpan` replaces `Recordable` in `Span` implementation for now - Do some gymnastics around unique_ptr + ownership - Update SDK tests (exporters/ext tests still borked) - For now, `ExportableSpan` has shared ptr reference to originating Tracer. TBD on whether this stays.

pyohannes · 2021-03-24T21:28:08Z

It would be great to split this PR up, that would make reviewing much easier.

Regarding the ExportableSpan abstraction, I'd be curious to hear your opinion about a different approach.

Have a MultiSpanProcessor, like you sketched it out. Internally processors are stored in a map, mapping processor addresses to processors.

vector<std::unique_ptr<SpanProcessor>> processors;
processors.push_back(std::move(zipkin_processor));
processors.push_back(std::move(otel_processor));
auto processor = new MultiSpanProcessor(processors, options);

Have a type MultiRecordable, which implements the Recordable interface, and encapsulates several regular recordables. The MultiSpanProcessor would initialize it like this:

std::unique_ptr<Recordable> MultiSpanProcessor::MakeRecordable() noexcept override
{
  auto recordable = std::unique_ptr<Recordable>(new MultiRecordable);
  foreach (auto processor  : processors_) {
    recordable.Add(processor_.second->MakeRecordable(), processor_.get());
  }
  return recordable;
}

All the setters on MultiRecordable would fan out to all the encapsulated recordables:

void MultiRecordable::SetStatus(opentelemetry::trace::StatusCode code,
                nostd::string_view description) noexcept
{
    for (auto& recordable : recordables_) 
    {
         recordable.second->SetStatus(code, description);
    }
}

At calls to OnStart and OnEnd, the MultiSpanProcessor fans out to all processors:

void MultiSpanProcessor::OnEnd(std::unique_ptr<Recordable> &&recordable) noexcept
{
  auto multi_recordable = static_cast<MultiRecordable>(recordable);
  auto recordables = multi_recordables.Release();

  for (auto& r : recordables)
  {
    auto processor = processors_.find(r.GetProcessorPtr());
    if (processor != processors_.end())
    {
      processor.second->OnEnd(span->ReleaseExportableSpanFor(*processor));
    }
  }
}

I don't see many big benefits in having a separate ExportableSpan abstraction, it might even be confusing for many users (as the recordable approach by itself is already complicated enough).

jsuereth · 2021-03-25T14:10:15Z

@pyohannes I tried to take an approach just like that. In fact I even called it "MultiRecordable". TL;DR; OnStart broke it for me.

Primary issue is you need to support these things:

Span needs to own the recordable
OnStart callback needs a reference to the appropriate recordable.
OnEnd needs to take back ownership of the recordable
Recordable is generated by either Processor or Exporter
Recordable does not attach Resource or InstrumentationLibrary to Spans. ExportableSpan was my place to hang this.

#2 and #5 caused issues in MultiRecordable design so I went this direction.

I've started fragmenting out the PR, so I'll send a multi-processor focused PR where we can hash out ideas.

lalitb · 2021-04-02T17:09:42Z

I've started fragmenting out the PR, so I'll send a multi-processor focused PR where we can hash out ideas.

@jsuereth - Hope it is good to hold the review comment for this PR as you would be splitting this into separate PRs - TracerContext and multi-processor. Just want to be sure that you are not waiting for review comments here :)

jsuereth · 2021-04-02T22:22:38Z

@lalitb No, I'm still splitting this up. Ran into some illness that ruined my whole week, should see something from me this weekend/early next week.

lalitb · 2021-04-03T03:31:15Z

@lalitb No, I'm still splitting this up. Ran into some illness that ruined my whole week, should see something from me this weekend/early next week.

No issues @jsuereth . This was just to ensure you are not waiting for comments on this PR. Take care of yourself.

jsuereth added 5 commits February 24, 2021 18:56

Initial cut at a TracerContext object.

708c0d0

Cleanup warnings, get SDK compiling with TracerContext shared_ptr.

cfd651c

Get rest of the sdk/api compiling. Migrate processor back to shared.

dabf670

clang format.

5d3611a

Fix cmake build.

87c6e28

jsuereth added 3 commits February 26, 2021 17:31

Fixes for review.

7b2d8fe

Fix minor typo.

8ee390c

Remove space at end of line.

0647a4c

jsuereth marked this pull request as ready for review February 27, 2021 15:57

jsuereth requested a review from a team February 27, 2021 15:57

pyohannes reviewed Mar 1, 2021

View reviewed changes

jsuereth added 6 commits March 10, 2021 13:02

Merge main.

51a48d5

Update trace API around span processor and 'set' vs. 'register'

a6ccb42

Fix OTLP from reworked lifecycle issues.

71cb832

Detangle zpages shared data state and fix remaining build issues on l…

b05f899

…inux.

Update some docs.

96b25a2

Processor now takes Spans as input, update all code. Prefer using Tra…

e3159b7

…ce API in tests rather than recordable.

lalitb reviewed Mar 15, 2021

View reviewed changes

jsuereth added 2 commits March 24, 2021 12:50

Fix tests.

0356183

This was referenced Apr 5, 2021

Create shared context for updating span pipeline from TracerProvider and affecting Tracer. #650

Merged

Adding resource to TracerProvider #334

Closed

jsuereth closed this Apr 7, 2021

jsuereth mentioned this pull request Apr 7, 2021

Allow multiple SpanProcessors per TracerProvider #664

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create SDK TRACE context, a shared reference to tracer pipelines between Tracer/TracerProvider. Update tests. #590

Create SDK TRACE context, a shared reference to tracer pipelines between Tracer/TracerProvider. Update tests. #590

jsuereth commented Feb 26, 2021 •

edited

Loading

codecov bot commented Feb 26, 2021 •

edited

Loading

pyohannes Mar 1, 2021

jsuereth Mar 3, 2021

seemk commented Mar 13, 2021

lalitb Mar 15, 2021 •

edited

Loading

jsuereth Mar 22, 2021

lalitb commented Mar 15, 2021

jsuereth commented Mar 22, 2021

lalitb commented Mar 22, 2021

seemk commented Mar 23, 2021

pyohannes commented Mar 24, 2021

jsuereth commented Mar 25, 2021

lalitb commented Apr 2, 2021

jsuereth commented Apr 2, 2021

lalitb commented Apr 3, 2021

Create SDK TRACE context, a shared reference to tracer pipelines between Tracer/TracerProvider. Update tests. #590

Create SDK TRACE context, a shared reference to tracer pipelines between Tracer/TracerProvider. Update tests. #590

Conversation

jsuereth commented Feb 26, 2021 • edited Loading

Changes

codecov bot commented Feb 26, 2021 • edited Loading

Codecov Report

pyohannes Mar 1, 2021

Choose a reason for hiding this comment

jsuereth Mar 3, 2021

Choose a reason for hiding this comment

seemk commented Mar 13, 2021

lalitb Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

jsuereth Mar 22, 2021

Choose a reason for hiding this comment

lalitb commented Mar 15, 2021

jsuereth commented Mar 22, 2021

lalitb commented Mar 22, 2021

seemk commented Mar 23, 2021

pyohannes commented Mar 24, 2021

jsuereth commented Mar 25, 2021

lalitb commented Apr 2, 2021

jsuereth commented Apr 2, 2021

lalitb commented Apr 3, 2021

jsuereth commented Feb 26, 2021 •

edited

Loading

codecov bot commented Feb 26, 2021 •

edited

Loading

lalitb Mar 15, 2021 •

edited

Loading