-
Notifications
You must be signed in to change notification settings - Fork 524
add header ordering and short circuits to known headers #1135
Conversation
var pUI = (uint*)pUB; | ||
var pUS = (ushort*)pUB; | ||
switch (keyLength) | ||
{{{Each(loop.Headers.Where(h => h.PrimaryHeader).GroupBy(x => x.Name.Length), byLength => $@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this doing anything different than AppendNonPrimaryHeaders
? Couldn't loop.Headers just be reordered to have put the PrimaryHeaders first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GroupBy(x => x.Name.Length)
breaks that ordering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer it if the template could be parameterized with the list of headers as the argument instead of copy/pasted.
{Each(loop.Headers.OrderBy(h => !h.PrimaryHeader), header => $@" | ||
if ({header.TestBit()}) | ||
{{ | ||
_headers._{header.Identifier} = default(StringValues); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a micro benchmark showing this is faster than _headers = default(HeaderReferences);
? Is there any number of set headers where this is no longer the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just calling ClearFast 100M times with current implementation takes about 3s on my dev box. With this change and techempower headers set it takes about .5s. However if I set all the bits it takes 8s. 12 bits seems to be near the tipover point, with check >12 bits the max time is about 3.5s. FrameHeaders.BitCount starts to show up as fairly significant in profiling the microbenchmark. Not sure if we can do something to skip it sometimes or if it isn't significant in more real world scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HeaderReferences is a pretty big struct - does the = default()
change improve after this tweak in coreclr? dotnet/coreclr#7198
Alas its not currently part of 1.1.0; was going to be added dotnet/coreclr#7341 but..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested that change yet, but moving to 1.1.0 looks like a fairly consistent 1.5-2% win. Since JIT_Memset was showing up in the profiles for ClearFast that could be a win for the current implementation.
Normally I would worried about over fitting to the plaintext benchmark, but this doesn't seem to make performance worse when there are more headers. I'm curious about whether ClearFast doesn't become slower with more headers though. |
1852cec
to
26229fa
Compare
New commit rebases against dev, refactors Append to have template string and adds a bailout to ClearFast if there's too many (12) headers. |
} | ||
|
||
|
||
if (((_bits & 1048576L) != 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This indentation looks a bit off.
@@ -4601,6 +5097,8 @@ public unsafe void Append(byte[] keyBytes, int keyOffset, int keyLength, string | |||
} | |||
} | |||
break; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove these linebreaks
|
refactor known header changes - create template string from Append switch - add a check to ClearFast to bail out if too many headers are set
26229fa
to
11ed34f
Compare
I revisited the 'header trim' we explored before RTM (#795) and it still seemed to be about a 3% performance win for techempower plaintext.
To avoid having to decide which headers to remove, this change allows us to put the most common headers first and can short circuit instead of having to check every single known header. This results in almost the same perf as the header trim. In million reqs/sec: