Should exports be async #1422

huntc · 2020-07-16T03:26:14Z

I’m writing an export class for Reactive streams, as I did for OpenTracing and Dropwizard Metrics.

It occurs me that the export method should be returning a CompletionStage of a result instead of just a result. An export implementation is often performing some long lived action such as communicating over the network. I note that the Jaeger export function actually blocks.

What’s the appetite for making the export method non-blocking?

jkwatson · 2020-07-16T03:34:09Z

Hi @huntc . I think our general approach is to leave this up to the exporter implementation. For example, the New Relic exporter returns immediately and handles all the processing on a background thread. If you want an async exporter, then my recommendation would be to make yours do the work in the background.

The jaeger exporter should probably be changed to do the same, I'd guess.

huntc · 2020-07-16T03:46:57Z

Thanks, @jkwatson. What's the effect of the returned status indicating failure?

If knowing whether the export is successful is important then I'd suggest that the method declaration should reflect async possibilities for implementations.

If it isn't important to know if the export is successful then the return type should be Void.

WDYT?

jkwatson · 2020-07-16T04:02:43Z

We've had this discussion in the specs SIG. At the moment, there is no effect of returning the failure status. We made the method return an enum value in case in the future we want to support a robust retry-system in the SDK. We didn't want to potentially break existing exporters if we did decide to enhance the capabilities of the SDK in this way. Does that make sense?

huntc · 2020-07-16T04:15:52Z

Thanks, John.

Returning a status that yields no current benefit and is generally only attainable by blocking does not make sense to me.

If we may want to have a robust retry system in the future then I'd suggest that we need to use a completion stage now... blocking the thread isn't a great way to go and we know that exporters perform long-lived actions.

That said, returning a Void perhaps reflects the correct semantics. I agree that letting the exporters deal with retries in their own way makes sense. Different types of connectivity demand different retry policies e.g. if I'm using an exporter from a node known to have poor internet connectivity and low power requirements, sending on a best-effort basis is fine. If I'm using an exporter within a data center then retrying may make more sense. Then there are circuit-breaker considerations that apply to some environments, but not others...

Now feels the time to introduce API breaking changes before going GM.

Please feel free to point me at another forum if there's a wider conversation to be had. I would have thought it'd be language-specific though given the existence of coroutines etc. in other languages.

jkwatson · 2020-07-16T14:54:38Z

I don't necessarily disagree with you, but we'd need to get the spec updated first, if we want to change it, since this is something that is specified across all languages: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#exportbatch

huntc · 2020-07-16T21:24:46Z

Thanks for the spec ref. Can you please highlight where you see that returning a completion stage of the status is at conflict with the spec?

…

On 17 Jul 2020, at 00:54, John Watson ***@***.***> wrote: I don't necessarily disagree with you, but we'd need to get the spec updated first, if we want to change it, since this is something that is specified across all languages: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#exportbatch — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jkwatson · 2020-07-16T21:28:02Z

Returns: ExportResult:

ExportResult is one of:

Success - The batch has been successfully exported. For protocol exporters this typically means that the data is sent over the wire and delivered to the destination server.
Failure - exporting failed. The batch must be dropped. For example, this can happen when the batch contains bad data and cannot be serialized.

huntc · 2020-07-16T21:34:01Z

Yes, but as previously stated, it is up to the language to declare how the status is returned. For example, the .NET exporters appear to return async results: https://github.com/open-telemetry/opentelemetry-dotnet/blob/master/src/OpenTelemetry.Exporter.Console/ConsoleExporter.cs#L49

…

On 17 Jul 2020, at 07:28, John Watson ***@***.***> wrote: Returns: ExportResult: ExportResult is one of: Success - The batch has been successfully exported. For protocol exporters this typically means that the data is sent over the wire and delivered to the destination server. Failure - exporting failed. The batch must be dropped. For example, this can happen when the batch contains bad data and cannot be serialized. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jkwatson · 2020-07-16T21:40:05Z

I don't disagree that this would be better, but I also don't see wiggle room in the spec, as currently written.
Also, unfortunately, CompletionStage is java 8+, so we'd need to choose one of the older async paradigms.

huntc · 2020-07-16T21:49:53Z

No Java 8? Astounding. `Future` then...? As an additional data point, I’m no Go expert, but Goroutines and pipelines appear to be used there too. :-)

…

On 17 Jul 2020, at 07:40, John Watson ***@***.***> wrote: I don't disagree that this would be better, but I also don't see wiggle room in the spec, as currently written. Also, unfortunately, CompletionStage is java 8+, so we'd need to choose one of the older async paradigms. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jkwatson · 2020-07-16T21:50:58Z

Astounding, indeed! We are supporting java 7 currently, so that grpc libraries can be instrumented with the APIs.

jkwatson · 2020-07-16T21:54:51Z

The decision on the current exporter API to include a non-async return value predates my involvement in the project. I'll write a little spec tweak PR that will make me more comfortable with changing the exporter API in this case.

huntc · 2020-07-16T21:58:50Z

The decision on the current exporter API to include a non-async return value predates my involvement in the project. I'll write a little spec tweak PR that will make me more comfortable with changing the exporter API in this case.

Thanks John. There’s a precedent as mentioned given the Go and .NET implementations. Would you like me to raise a PR here to change the return type to a Future?

jkwatson · 2020-07-16T22:00:53Z

open-telemetry/opentelemetry-specification#707

Yes, a PR would be very welcome!

huntc · 2020-07-16T22:07:59Z

I’ll get the PR done first thing. Meanwhile, in case you’re interested, here’s my work influencing the conversation. titanclass/reactive-streams-telemetry#6

huntc · 2020-07-17T04:35:56Z

The PR is in @jkwatson (took me longer than expected!): #1428

huntc · 2020-08-08T05:29:39Z

I think the label for this should indicate releasing before GA as it’ll have API implications.

Oberon00 · 2020-08-09T13:36:58Z

Should exports be async

My take on this is: It doesn't matter. Exporter + SpanProcessor are often a closely coupled unit at this point in practice. For example, an exporter that syncronously exports the batch of spans it gets (which is often the easiest to implement), cannot sensibly be used with a SimpleSpanProcessor because it would do a network requests for every single ended span on the critical path of the application. On the other hand, the BatchSpanProcessor already does all the boilerplate of setting up a background thread and batching spans together for the exporter.

There may be exporters where the threading model of the BatchSpanProcessor does not fit and neither does the SimpleSpanProcessor's. One choice you could make, is to do what the New Relic exporter does according to @jkwatson (#1422 (comment)): Use the SimpleSpan processor and then do threading/synchronicity and batching/streaming yourself. An equivalent approach would probably be to make a NewRelicSpanProcessor that does the export. Then you would not need to return a "fake" status code (but override an unused OnStart method instead).

IMHO the only reason to use the exporter interface is when you want to make use of the more complex SpanProcessor implementations to ease implementation of your exporter. If your exporter only works with SimpleSpanProcessor (which is really just an adapter from the SpanProcessor interface to the exporter interface), you'd better implement the SpanProcessor interface directly to avoid your exporter being accidentally used with a suboptimal SpanProcessor implementation.

In summary, I think exporter is just a convenience interface and is not really important. The SpanProcessor interface is the actually important one.

Oberon00 · 2020-08-09T13:47:33Z

One more point about async-ness: We currently struggle to find a use case for doing anything with the success/failure return value of the exporter. What is a span processor supposed to do with the "completed"/"still in progress" information that an "async" method would add? What code would it run in a completion callback?

huntc · 2020-08-09T20:48:33Z

Thanks for the feedback.

Having now spent almost a month pondering my original question and working on a PR, a couple of things stand out:

Types are important and the Exporters types should reflect the semantics of the interface - no matter as to there being an implementation gain or not. Types are documentation and therefore assist the new user of these interfaces in understanding the intentions of a class.
Blocking a thread should be avoided as it may consume resources unnecessarily. The use of work-stealing and other thread pool types, shared with other parts of a program, is crucial so that memory and other resources are used diligently. OTEL has a responsibility to have a sharp focus on resource usage being what should become a critical piece of infrastructure.

On the utility of having a result code, chaining exports can only be sensibly achieved by knowing the result of a previous export In a chain. Chaining exports is presently achievable.

I hope that this helps.

carlosalberto · 2020-08-10T14:42:58Z

@huntc From the last Java meeting, we will be trying out your API (I will do a deep review myself later this week of your PR, but focusing on the code, not the idea/design itself ;) ).

The points that @Oberon00 raises are very interesting though. To me, also for Java it's not as important to have this new API as it may be for, say, Scala frameworks (who do a fair amount of asynchronous calls).

Oberon00 · 2020-08-10T14:55:16Z

Types are important and the Exporters types should reflect the semantics of the interface

That's true. One thing to note is that when we change the API to be async, we have to consider how much sense the BatchSpanProcessor implementation still makes. If exports are async, then probably it should not have a background thread but invoke the exporter directly in OnEnd once a batch is full/the timeout is expired (or do we still need a background thread for the case where no new span is started/ended for quite some time?).

jkwatson · 2020-08-10T17:07:19Z

@Oberon00 : Your point about SpanProcessors being the more important construct is very interesting. Maybe what we really need is an alternative async-friendly SpanProcessor, if the disruptor version isn't working.

@huntc would it make sense to have a scala-friendly SpanProcessor implementation, maybe even written in scala, that would work with a new scala-friendly exporter interface? This seems like it might be a better long-term solution than trying to shoe-horn a scala-friendly exporter into the existing java-oriented SpanProcessor model. Thoughts?

trask · 2020-08-10T17:21:17Z

If exports are async, then probably it should not have a background thread but invoke the exporter directly in OnEnd once a batch is full/the timeout is expired

Even with an async exporter, I don't think you'd want to invoke it on the application execution thread. I think the idea is for the async exporter to do marshalling on the same thread, and then call an async http/gRPC framework, at which point it would return to the caller instead of waiting for synchronous http/gRPC.

huntc · 2020-08-10T20:37:17Z

To me, also for Java it's not as important to have this new API as it may be for, say, Scala frameworks (who do a fair amount of asynchronous calls).

I do not see the motivation for my original question being influenced by the language being used, be it Scala, Java, Kotlin, Clojure etc. My take is that the Java OTEL project represents the JVM as a whole. Java is the common denominator.

Async non-blocking code is as important in Java-land as anywhere else.

huntc · 2020-08-10T20:46:01Z

If exports are async, then probably it should not have a background thread but invoke the exporter directly in OnEnd once a batch is full/the timeout is expired (or do we still need a background thread for the case where no new span is started/ended for quite some time?).

Yes, the latter.

I think the batch span processor makes sense depending on how the exporter meets one’s use case. In my case, the exporter I’m writing will encourage the use of the simple span processor. It’s objective is to stream data to a collector as fast as it can and, in the case of back pressure, will drop older telemetry.

huntc · 2020-08-10T20:47:47Z

@huntc would it make sense to have a scala-friendly SpanProcessor implementation, maybe even written in scala, that would work with a new scala-friendly exporter interface? This seems like it might be a better long-term solution than trying to shoe-horn a scala-friendly exporter into the existing java-oriented SpanProcessor model. Thoughts?

I see no distinction here between Java and Scala.

jkwatson · 2020-08-10T22:16:43Z

Leaving the Java/Scala differences aside for the moment, what do you think of @Oberon00 's core thesis that perhaps a new SpanProcessor that better supports async models, with an more natively async contract with exporters would be a alternative approach to modifying the existing non-async-friendly BatchSpanProcessor?

huntc · 2020-08-10T22:47:50Z

...what do you think of @Oberon00 's core thesis that perhaps a new SpanProcessor that better supports async models, with an more natively async contract with exporters would be a alternative approach to modifying the existing non-async-friendly BatchSpanProcessor?

Firstly, I don't see that another type of span processor would change the contract of the exporter i.e. the exporter's types should still convey the behaviour expected.

Secondly, adding another type of span processor will perhaps just increase confusion. If anything, I'd stick with the simple span processor and do away with the batch one altogether. This then places more emphasis on the exporters taking responsibility for back-pressure, which I think is reasonable.

In summary, I feel that the user of these APIs should only concern themselves with which exporter to use. Exporters are easy to comprehend, whereas "span processor" is a bit of a foreign concept. Choosing both a span processor and an exporter seems a burden on the user.

jkwatson · 2020-08-10T23:47:03Z

Cool. Just wanted to think through all the options. Thanks for humoring me.

anuraaga · 2020-08-11T01:22:35Z

For a bit of color, I would say here Exporter is the low level primitive, and span processor is a higher level abstraction. The higher level generally can't "do more" than the primitives it has available. So the span processor, if wanting to go async, has to workaround by e.g. delegating to a disruptor thread (even if the exporter has its own event loop, it's a situation that makes me cry a little). The opposite wouldn't be true, a span processor can present synchronously if the exporter is async by just waiting on the result - we didn't add .join here but if anyone thinks this is important it's trivial to add.

Avoiding exporter interface when wanting async is an option, but then why do we even have exporters? The first thing I'd want to do is to stop implementing Exporter for the otlp ones since it makes sense to issue the gRPC calls asynchronously to reduce thread parking.

Oberon00 · 2020-08-11T09:31:51Z

@huntc

Firstly, I don't see that another type of span processor would change the contract of the exporter

We can have different exporter interfaces for use with different SpanProcessors. E.g. async-friendly SpanProcessors could use an AsyncExporter interface while others use the old exporter interface.

Secondly, adding another type of span processor will perhaps just increase confusion. If anything, I'd stick with the simple span processor and do away with the batch one altogether.

That's a slippery slope to removing the exporter/span processor distinction altogether. Especially if we are thinking about chained exports / chained span processors.

@anuraaga

Avoiding exporter interface when wanting async is an option, but then why do we even have exporters?

Yeah, I don't think this is currently very clearly motivated, especially if you only use the SimpleSpanProcessor as opposed to one that offers a different threading model to your exporter (like BatchSpanProcessor).

Oberon00 · 2020-08-11T09:35:32Z

So the span processor, if wanting to go async, has to workaround by e.g. delegating to a disruptor thread (even if the exporter has its own event loop, it's a situation that makes me cry a little).

I don't see this as a workaround, it's the value that it provides to the exporter, since it doesn't have to have an event loop/background thread itself. Of course this is bad if the exporter needs a different threading model, but would an async interface help here? You would still need to select your SpanProcessor implementation depending on whether the exporter is actually async or not.

trask · 2020-08-11T21:54:04Z

Cherry picking from above:

I feel that the user of these APIs should only concern themselves with which exporter to use. Exporters are easy to comprehend, whereas "span processor" is a bit of a foreign concept. Choosing both a span processor and an exporter seems a burden on the user.

You would still need to select your SpanProcessor implementation depending on whether the exporter is actually async or not.

Is there some way we can make Simple/Disruptor/BatchSpanProcessor an internal implementation detail of exporters? And make SpanProcessor all about enhancing/filtering spans?

huntc · 2020-08-11T22:06:06Z

Is there some way we can make Simple/Disruptor/BatchSpanProcessor an internal implementation detail of exporters? And make SpanProcessor all about enhancing/filtering spans?

This is exactly what I was alluding to above:

In summary, I feel that the user of these APIs should only concern themselves with which exporter to use. Exporters are easy to comprehend, whereas "span processor" is a bit of a foreign concept. Choosing both a span processor and an exporter seems a burden on the user.

Abolish span processors.

However, we are getting off track with this particular issue. My original question was about the appetite for making the exporter’s types reflect the nature of the operations they perform. This stands. It’d be great to close out this issue with a yay or nay wrt the exporter’s types. Let’s open another issue to discuss the relevance of span processors.

jkwatson · 2020-08-22T02:39:46Z

Closing this, as the PR to make exporters async has been merged.

huntc mentioned this issue Jul 17, 2020

Makes Export methods async #1428

Merged

jkwatson added the release:after-ga label Jul 21, 2020

jkwatson mentioned this issue Jul 21, 2020

Don't block the span processor thread waiting for the exporter to complete. #750

Closed

jkwatson closed this as completed Aug 22, 2020

Should exports be async #1422

Should exports be async #1422

Comments

huntc commented Jul 16, 2020

jkwatson commented Jul 16, 2020

huntc commented Jul 16, 2020

jkwatson commented Jul 16, 2020

huntc commented Jul 16, 2020

jkwatson commented Jul 16, 2020

huntc commented Jul 16, 2020 via email

jkwatson commented Jul 16, 2020

huntc commented Jul 16, 2020 via email

jkwatson commented Jul 16, 2020

huntc commented Jul 16, 2020 via email

jkwatson commented Jul 16, 2020

jkwatson commented Jul 16, 2020

huntc commented Jul 16, 2020

jkwatson commented Jul 16, 2020

huntc commented Jul 16, 2020

huntc commented Jul 17, 2020

huntc commented Aug 8, 2020

Oberon00 commented Aug 9, 2020 • edited Loading

Oberon00 commented Aug 9, 2020 • edited Loading

huntc commented Aug 9, 2020 • edited Loading

carlosalberto commented Aug 10, 2020

Oberon00 commented Aug 10, 2020

jkwatson commented Aug 10, 2020

trask commented Aug 10, 2020

huntc commented Aug 10, 2020

huntc commented Aug 10, 2020

huntc commented Aug 10, 2020

jkwatson commented Aug 10, 2020

huntc commented Aug 10, 2020

jkwatson commented Aug 10, 2020

anuraaga commented Aug 11, 2020

Oberon00 commented Aug 11, 2020 • edited Loading

Oberon00 commented Aug 11, 2020 • edited Loading

trask commented Aug 11, 2020

huntc commented Aug 11, 2020

jkwatson commented Aug 22, 2020

Oberon00 commented Aug 9, 2020 •

edited

Loading

Oberon00 commented Aug 9, 2020 •

edited

Loading

huntc commented Aug 9, 2020 •

edited

Loading

Oberon00 commented Aug 11, 2020 •

edited

Loading

Oberon00 commented Aug 11, 2020 •

edited

Loading