add header ordering and short circuits to known headers #1135

nathana1 · 2016-09-30T21:23:19Z

I revisited the 'header trim' we explored before RTM (#795) and it still seemed to be about a 3% performance win for techempower plaintext.

To avoid having to decide which headers to remove, this change allows us to put the most common headers first and can short circuit instead of having to check every single known header. This results in almost the same perf as the header trim. In million reqs/sec:

Baseline	Header Trim	Short Circuit
5.18	5.42	5.39

halter73 · 2016-09-30T22:41:30Z

tools/Microsoft.AspNetCore.Server.Kestrel.GeneratedCode/KnownHeaders.cs

+                var pUI = (uint*)pUB;
+                var pUS = (ushort*)pUB;
+                switch (keyLength)
+                {{{Each(loop.Headers.Where(h => h.PrimaryHeader).GroupBy(x => x.Name.Length), byLength => $@"


Is this doing anything different than AppendNonPrimaryHeaders? Couldn't loop.Headers just be reordered to have put the PrimaryHeaders first?

GroupBy(x => x.Name.Length) breaks that ordering

I would prefer it if the template could be parameterized with the list of headers as the argument instead of copy/pasted.

halter73 · 2016-09-30T22:45:42Z

tools/Microsoft.AspNetCore.Server.Kestrel.GeneratedCode/KnownHeaders.cs

+            {Each(loop.Headers.OrderBy(h => !h.PrimaryHeader), header => $@"
+                if ({header.TestBit()})
+                {{
+                    _headers._{header.Identifier} = default(StringValues);


Is there a micro benchmark showing this is faster than _headers = default(HeaderReferences);? Is there any number of set headers where this is no longer the case?

Just calling ClearFast 100M times with current implementation takes about 3s on my dev box. With this change and techempower headers set it takes about .5s. However if I set all the bits it takes 8s. 12 bits seems to be near the tipover point, with check >12 bits the max time is about 3.5s. FrameHeaders.BitCount starts to show up as fairly significant in profiling the microbenchmark. Not sure if we can do something to skip it sometimes or if it isn't significant in more real world scenarios.

The HeaderReferences is a pretty big struct - does the = default() change improve after this tweak in coreclr? dotnet/coreclr#7198

Alas its not currently part of 1.1.0; was going to be added dotnet/coreclr#7341 but..?

I haven't tested that change yet, but moving to 1.1.0 looks like a fairly consistent 1.5-2% win. Since JIT_Memset was showing up in the profiles for ClearFast that could be a win for the current implementation.

halter73 · 2016-09-30T22:51:26Z

Normally I would worried about over fitting to the plaintext benchmark, but this doesn't seem to make performance worse when there are more headers. I'm curious about whether ClearFast doesn't become slower with more headers though.

nathana1 · 2016-10-06T22:31:49Z

New commit rebases against dev, refactors Append to have template string and adds a bailout to ClearFast if there's too many (12) headers.

halter73 · 2016-10-19T01:49:55Z

src/Microsoft.AspNetCore.Server.Kestrel/Internal/Http/FrameHeaders.Generated.cs

+            }
+
+
+                if (((_bits & 1048576L) != 0))


Nit: This indentation looks a bit off.

halter73 · 2016-10-19T01:53:06Z

LGTM. @CesarBS @pakrym Mind taking a quick look?

@nathana1 can you rebase again? Thanks.

pakrym · 2016-10-19T02:06:16Z

src/Microsoft.AspNetCore.Server.Kestrel/Internal/Http/FrameHeaders.Generated.cs

@@ -4601,6 +5097,8 @@ public unsafe void Append(byte[] keyBytes, int keyOffset, int keyLength, string
                            }
                        }
                        break;
+


nit: remove these linebreaks

pakrym · 2016-10-19T02:08:30Z

refactor known header changes - create template string from Append switch - add a check to ClearFast to bail out if too many headers are set

dnfclas added the cla-already-signed label Sep 30, 2016

halter73 reviewed Sep 30, 2016

View reviewed changes

nathana1 force-pushed the nathana1/known-header-short-circuit branch from 1852cec to 26229fa Compare October 6, 2016 22:28

halter73 reviewed Oct 19, 2016

View reviewed changes

pakrym reviewed Oct 19, 2016

View reviewed changes

add header ordering and short circuits to known headers

11ed34f

refactor known header changes - create template string from Append switch - add a check to ClearFast to bail out if too many headers are set

nathana1 force-pushed the nathana1/known-header-short-circuit branch from 26229fa to 11ed34f Compare October 19, 2016 23:11

test fixes and whitespace changes to generated code

610601c

halter73 merged commit 610601c into dev Oct 24, 2016

halter73 deleted the nathana1/known-header-short-circuit branch October 26, 2016 00:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add header ordering and short circuits to known headers #1135

add header ordering and short circuits to known headers #1135

nathana1 commented Sep 30, 2016

halter73 Sep 30, 2016

benaadams Sep 30, 2016

halter73 Sep 30, 2016

halter73 Sep 30, 2016

nathana1 Oct 6, 2016

benaadams Oct 6, 2016

nathana1 Oct 6, 2016

halter73 commented Sep 30, 2016

nathana1 commented Oct 6, 2016

halter73 Oct 19, 2016

halter73 commented Oct 19, 2016

pakrym Oct 19, 2016

pakrym commented Oct 19, 2016

add header ordering and short circuits to known headers #1135

add header ordering and short circuits to known headers #1135

Conversation

nathana1 commented Sep 30, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

halter73 commented Sep 30, 2016

nathana1 commented Oct 6, 2016

Choose a reason for hiding this comment

halter73 commented Oct 19, 2016

Choose a reason for hiding this comment

pakrym commented Oct 19, 2016