-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add partial retry capability to OTEL ES exporter. #2456
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2456 +/- ##
==========================================
+ Coverage 95.50% 95.51% +0.01%
==========================================
Files 208 208
Lines 10756 10758 +2
==========================================
+ Hits 10273 10276 +3
+ Misses 407 406 -1
Partials 76 76
Continue to review full report at Codecov.
|
@joe-elliott would you like to have a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the logic seems generally fine, but I'm concerned about the increased memory/cpu usage under normal operation due to creating the ConvertedData
structs.
I'm also concerned about failure scenarios. If a large number of spans start failing for any reason then bulkItemsToTraces
will start being called extremely rapidly putting additional memory/GC/cpu pressure on this process. I'm concerned a failing Elasticsearch cluster could then cause a cascading failure into the Jaeger collector/ingester.
So it seems that you primarily are using the ConvertedData
struct to rebuild the trace to return in the PartialTracesError
as accurately as possible. However, I don't think you need all this data. The data we actually need to store is already in the span (span data + process). Using the dbmodel.Span
alone I think we could rebuild enough of a trace to return in PartialTracesError
that the next time it it came into the exporter it would be sufficient to accurately save it to ES.
For instance, the InstrumentationLibrarySpan.InstrumentationLibrary
data is being kept in order to rebuild the trace, but the Jaeger exporter doesn't do anything with this data.
Basically I'm proposing putting a partial/fake trace back into PartialTracesError
derived from dbmodel.Span
only. We should be able to rebuild the trace accurately enough that when it comes back into the ES exporter we will save the data correctly.
The
Rebuilding
right now it's not being converted to ES span model, but it's a pending feature that should be implemented. We need all original data (resource, instrumentation and span) to correctly reconstruct ES model span. |
I see and agree 👍.
Was about to comment about the
Agree with this as well. |
if traces.ResourceSpans().Len() == 0 { | ||
traces.ResourceSpans().Resize(1) | ||
} else { | ||
traces.ResourceSpans().Resize(traces.ResourceSpans().Len()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this do anything?
Should we just set traces.ResourceSpans().Resize(len(bulkItems))
before the loop and make each bulk item its own ResourceSpans? This would be the simplest way to do it.
Otherwise I think you'd need a map of Resource
=> ResourceSpans
which you could check for every bulkItem
to see if a ResourceSpan
for this Resource
already existed and append to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 it's better to allocate at the beginning.
if !spanData.Resource.IsNil() { | ||
spanData.Resource.CopyTo(rss.Resource()) | ||
} | ||
rss.InstrumentationLibrarySpans().Resize(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, you could keep some kind of map of Resource/InstrumentationLibrary => InstrumentationLibrarySpans which you'd then check to see if you had an InstrumentationLibrarySpans to append to or if you should just make a new one.
Might be faster to just make one ResourceSpans/InstrumentationLibrarySpans for each bulkitem and forget trying to reuse them. It's hard to say if spans in a batch will likely come from the same trace and this will be worthwhile or not. Probably depends on which processors are configured in the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot expect that spans in a batch share the same resource&instrumentation.
Might be faster to just make one ResourceSpans/InstrumentationLibrarySpans for each bulkitem and forget trying to reuse them. It's hard to say if spans in a batch will likely come from the same trace and this will be worthwhile or not. Probably depends on which processors are configured in the pipeline.
Keeping the resource/instrumentation -> spans would be more memory efficient. However, I would ho with this simple approach for now. The instrumentation library has two string fields and resource has a map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed +1
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
abb7a01
to
026ab6e
Compare
@joe-elliott could you please have a look again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bulkItemsToTraces
is difficult code to write in a performant way particurlarly given the generated shim you're working through to access the proto.
Without going to more extremes (such as the maps already discussed) I think this is as good as can be done.
Looks good to me!
The partial retry as the name suggests ensures that only spans that failed to be stored were retried. The returned error encapsulates
pdata.Traces
that is used by qretry on in the consecutive call to the exporter. Thepdata.Traces
for retry is created from a struct that is created by ES model translator.