[WIP] Performance optimize `Ask` #4965

Aaronontheweb · 2021-04-20T21:29:40Z

No description provided.

…names

Aaronontheweb · 2021-04-20T21:30:22Z

Going to use the baseline data from #4962 to measure the impact on Ask itself, even though there's ActorSelection overhead in it.

Aaronontheweb · 2021-04-20T21:34:27Z

Baseline numbers on dev branch

No idea why these throughput numbers are lower than the gains I posted for ActorSelection - probably has to do with my PC workload in the background

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.928 (2004/?/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.201
  [Host]     : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT
  DefaultJob : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT

Method	Iterations	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
RequestResponseActorSelection	10000	98.446 ms	1.5898 ms	1.4871 ms	5000.0000	-	-	20.2 MB
CreateActorSelection	10000	5.397 ms	0.0447 ms	0.0418 ms	953.1250	-	-	3.81 MB

Aaronontheweb · 2021-04-20T21:38:43Z

Got a cleaner baseline this time

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.928 (2004/?/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.201
  [Host]     : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT
  DefaultJob : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT

Method	Iterations	Mean	Error	StdDev	Median	Gen 0	Gen 1	Gen 2	Allocated
RequestResponseActorSelection	10000	83.867 ms	1.4599 ms	2.8474 ms	82.746 ms	5000.0000	-	-	20.2 MB
CreateActorSelection	10000	5.407 ms	0.1040 ms	0.1068 ms	5.365 ms	953.1250	-	-	3.81 MB

Aaronontheweb · 2021-04-20T21:44:11Z

Introduced object pooling for StringBuilder, inspired by what @to11mtm did on #4929

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.928 (2004/?/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.201
  [Host]     : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT
  DefaultJob : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT

Method	Iterations	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
RequestResponseActorSelection	10000	84.063 ms	1.5885 ms	1.4859 ms	4857.1429	-	-	19.21 MB
CreateActorSelection	10000	5.318 ms	0.0632 ms	0.0560 ms	953.1250	-	-	3.81 MB

Throughput seems pretty much the same, but definitely a drop in Gen 1 allocation and total memory.

Aaronontheweb · 2021-04-20T21:45:28Z

Interestingly, this change will also affect how quickly actors can spawn since the same method I'm pooling StringBuilders on is used for generating random new actor names. I'll give that a spin too.

to11mtm · 2021-04-20T21:46:00Z

src/core/Akka/Util/Base64Encoding.cs

+    internal static class PooledObject
+    {
+        public static readonly ObjectPool<StringBuilder> StringBuilderPool =
+            new DefaultObjectPoolProvider().CreateStringBuilderPool(512, 2048);


Are we expecting to re-use this pool elsewhere? These sizes may be a bit much otherwise.

I think there's several other places where we can use it, but the sizes I chose are arbitrary

I was mostly curious if there was significant overhead in accessing and returning a pooled StringBuilder vs Gen 0-ing one.

chris-sung · 2021-04-28T13:49:43Z

src/core/Akka/Util/Base64Encoding.cs

-                next = next >> 6;
+                encodedBytes[writeIndex]= Base64Chars[index];
+                next = next >> 6;	
+                writeIndex++;
            } while (next != 0);


I'm not an expert on akka.net internals but here are my two cents

You can always unroll the loop instead of looping since this is a fixed-size data type it will increase performance at least fourfold
Also, the prefix is preventing it from doing micro-optimization so you can split methods (with any/with just 1/without prefix) to receive performance optimization case by case basis, you also can skip stackalloc in this way (stackalloc benefits from less allocation but you have to copy the memory block anyways, so one less, much better performance)

just a side note, I don't think this is important but I think we may have endianness issue here

I'm up for trying all of the above.

just a side note, I don't think this is important but I think we may have endianness issue here

ah, you mean I'm doing it big endian? Whoops.

ah, you mean I'm doing it big endian? Whoops.

Meh, I'm not saying you're doing anything wrong and also I don't think anyone will try to bitconvert long to base64 and do whatever you're doing here themselves in between different architecture systems/cluster nodes, I'm saying just because I have zero knowledge regarding internals. so, it's a just side note.

We're just using this to help generate random actor names, which is something we have to do for the temporary 1-off actors that get used to power Ask and GracefulStop - when those actor paths get translated over the network it's all done as a string so it's probably of little consequence, but I still don't like doing things wrong :p

Then I don't think it does matter at all. you're doing it just fine 👍

Well, I'm doing this for fun and.. I got quite confusing result while validating results

`BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i9-9900K CPU 3.60GHz (Coffee Lake), 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.202
[Host] : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT
DefaultJob : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

Method Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated

Benchmark 28.82 ns 0.262 ns 0.245 ns 1.00 0.0057 - - 48 B

BenchmarkOptimization1 12.68 ns 0.207 ns 0.184 ns 0.44 0.0057 - - 48 B

and Here's my code

[MethodImpl(MethodImplOptions.AggressiveInlining)] public static unsafe string Base64EncodeOptimization1(long value) { // disable range checking fixed (char* base64 = &base64Table[0]) { Span<char> encodedBytes = stackalloc char[11]; var next = value; encodedBytes[0] = base64[(int) ((next >> (6 * 0)) & 63)]; encodedBytes[1] = base64[(int) ((next >> (6 * 1)) & 63)]; encodedBytes[2] = base64[(int) ((next >> (6 * 2)) & 63)]; encodedBytes[3] = base64[(int) ((next >> (6 * 3)) & 63)]; encodedBytes[4] = base64[(int) ((next >> (6 * 4)) & 63)]; encodedBytes[5] = base64[(int) ((next >> (6 * 5)) & 63)]; encodedBytes[6] = base64[(int) ((next >> (6 * 6)) & 63)]; encodedBytes[7] = base64[(int) ((next >> (6 * 7)) & 63)]; encodedBytes[8] = base64[(int) ((next >> (6 * 8)) & 63)]; encodedBytes[9] = base64[(int) ((next >> (6 * 9)) & 63)]; encodedBytes[10] = base64[(int) ((next >> (6 * 10)) & 63)]; return new string(encodedBytes); } }

Here's the catch:

This is actually base63 (no '='), I think this is intended it but made me scratch my head

and original code loses all top bits if all zero (edge case: 0), well, if it's intended, okay.

but the chance of hash conflict is higher than I thought because of 1) and 2)

I think my micro-optimization is pointless unless you use this in a tight loop (those two are only nano-second differences..)

Hi @chris-sung - sorry I missed this!

This is actually base63 (no '='), I think this is intended it but made me scratch my head

That's correct - believe we can't use that due to Akka.NET Uri encoding restrictions. This function is primarily used to generate random actor names.

and original code loses all top bits if all zero (edge case: 0), well, if it's intended, okay.

That should also be fine.

I think my micro-optimization is pointless unless you use this in a tight loop (those two are only nano-second differences..)

We don't use this in a tight loop - most of the benefit from this function comes from eliminating the stringbuilder allocation and reducing GC overhead. This function typically gets invoked when we're generating actor names, so the area where you'll see the biggest performance benefit is when a large number of Ask<T> operations are being created all at once.

Maybe we should still incorporate this in v1.4.21 though?

thanks for the explanation. what do you mean by "incorporate this" exactly?

I mean use it instead of my current implementation!

Aaronontheweb added 2 commits April 20, 2021 16:25

simplified some of Ask's internals

8159be4

pool StringBuilder for base64 encoding on temporary / random actor …

314c1d5

…names

Aaronontheweb added akka-actor perf labels Apr 20, 2021

Merge branch 'dev' into performance/optimize-Ask

bca01c9

to11mtm reviewed Apr 20, 2021

View reviewed changes

Span<char> magic

a2b6c77

Aaronontheweb changed the title ~~Performance optimize Ask~~ [WIP] Performance optimize Ask Apr 20, 2021

chris-sung reviewed Apr 28, 2021

View reviewed changes

Aaronontheweb closed this Jun 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Performance optimize `Ask` #4965

[WIP] Performance optimize `Ask` #4965

Aaronontheweb commented Apr 20, 2021

Aaronontheweb commented Apr 20, 2021

Aaronontheweb commented Apr 20, 2021 •

edited

Loading

Aaronontheweb commented Apr 20, 2021

Aaronontheweb commented Apr 20, 2021 •

edited

Loading

Aaronontheweb commented Apr 20, 2021

to11mtm Apr 20, 2021

Aaronontheweb Apr 20, 2021

Aaronontheweb Apr 20, 2021

chris-sung Apr 28, 2021

Aaronontheweb Apr 28, 2021

chris-sung Apr 28, 2021

Aaronontheweb Apr 28, 2021

chris-sung Apr 28, 2021

christallire Apr 29, 2021 •

edited

Loading

Aaronontheweb Jun 4, 2021

christallire Jun 4, 2021

Aaronontheweb Jun 4, 2021

Method	Mean	Error	StdDev	Ratio	Gen 0	Gen 1	Gen 2	Allocated
Benchmark	28.82 ns	0.262 ns	0.245 ns	1.00	0.0057	-	-	48 B
BenchmarkOptimization1	12.68 ns	0.207 ns	0.184 ns	0.44	0.0057	-	-	48 B

[WIP] Performance optimize Ask #4965

[WIP] Performance optimize Ask #4965

Conversation

Aaronontheweb commented Apr 20, 2021

Aaronontheweb commented Apr 20, 2021

Aaronontheweb commented Apr 20, 2021 • edited Loading

Aaronontheweb commented Apr 20, 2021

Aaronontheweb commented Apr 20, 2021 • edited Loading

Aaronontheweb commented Apr 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christallire Apr 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[WIP] Performance optimize `Ask` #4965

[WIP] Performance optimize `Ask` #4965

Aaronontheweb commented Apr 20, 2021 •

edited

Loading

Aaronontheweb commented Apr 20, 2021 •

edited

Loading

christallire Apr 29, 2021 •

edited

Loading