-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api] Optimise TraceContextPropagator.Extract
#5749
[api] Optimise TraceContextPropagator.Extract
#5749
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5749 +/- ##
==========================================
+ Coverage 83.38% 86.22% +2.84%
==========================================
Files 297 256 -41
Lines 12531 11140 -1391
==========================================
- Hits 10449 9606 -843
+ Misses 2082 1534 -548
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Span<char> buffer = stackalloc char[256]; | ||
Span<char> keyLookupBuffer = stackalloc char[96]; // 3 x 32 keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love what this PR is doing but I have some concern here with allocating 704 bytes on the stack ((256 + 96) * 2)
.
System.Text.Json for example will at max stackalloc
256 bytes (128 char
s): https://github.com/dotnet/runtime/blob/a86987ccab917433d065fe5dc8870fc261f79d14/src/libraries/System.Text.Json/Common/JsonConstants.cs#L12-L13
Not sure why that number, but I'm sure a lot of thought went into it 😄 /cc @stephentoub
I think this pattern for "stackalloc
with fallback to rented array" is solid but the version here seems a bit off from what I have seen. IMO it more commonly looks like this:
char[]? rentedBuffer = null;
Span<char> destination = length <= Constants.StackallocCharThreshold ?
stackalloc char[Constants.StackallocCharThreshold] :
(rentedBuffer = ArrayPool<char>.Shared.Rent(length));
try
{
...
}
finally
{
if (rentedBuffer != null)
ArrayPool<char>.Shared.Return(rentedBuffer );
}
Would something like that work/help simplify things here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why that number, but I'm sure a lot of thought went into it 😄
It's a bit squishy. We almost never go above 1K. We typically use 256 or 512 bytes, but it varies case-to-case based on knowledge of that particular location and how likely longer buffers are expected to be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeBlanch I think I'd seen somewhere that 1K was a "reasonable" max value. I went for 256 chars as, realistically, that should cover most trace state scenarios. We could drop to 128 (256B) instead, though, and still cover most shorter trace state strings. Combined with the 96 chars (192B) for the duplicate lookup, that might be more reasonable.
I can switch to the try/finally here. The Return
method initially had some extra logic, but I refactored that before the PR. I left it in that form because I preferred avoiding one extra level of indentation introduced with the extra blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the pattern the try-finally
can be ommited. In case of exceptions the buffer (if rented) will just be dropped instead of returned to the pool, but that isn't a problem to the pool.
In first incarnations of this pattern try-finally got used, but later on that pattern evolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't care so much about the try-finally
we could drop that. Just want to make sure the stackalloc
is good.
We can't really control where Extract
runs so it is probably a good idea to be conservative. For incoming request (think AspNetCore instrumentation) it will probably be ~early. But something like processing messages from a queue, who knows! What I would really like to see is us avoid the stackalloc
if we know there is a lot of tracestate but I guess hard to do because there could be multiple headers needing to be processed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeBlanch We could just be safe and go with ArrayPool
. I preferred stackalloc
since we might reasonably expect the state to be small, and we can avoid a small amount of overhead that we incur by renting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gfoidl I think we need the finally
here, though, as there are multiple branches where this code could return from this method on the non-exception path, and we need to ensure that the array is returned in those cases, too.
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
@CodeBlanch, I am just bumping here to keep this open and see if we can solve the discussion on using |
Sorry for the delay on this @stevejgordon! We did look at this on the SIG yesterday. @vishweshbankwar will do a review pass when he has a moment and then we'll go from there. |
src/OpenTelemetry.Api/Context/Propagation/TraceContextPropagator.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
TraceContextPropagator.Extract
TraceContextPropagator.Extract
Contributes to #728
Changes
This PR removes almost all overhead of extracting the
tracestate
inside theTraceContextPropagator
by switching to a stack-allocated or rented buffer when building the final string. Specifically, this avoids usingStringBuilder
andHashSet
to prevent heap allocations. The execution time is also slightly improved with this approach. I've preserved the existing behaviour based on the tests.This also adds a benchmark to measure the performance of the extract method.
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial changes[ ] Changes in public API reviewed (if applicable)